Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM)
-
Upload
quemby-dale -
Category
Documents
-
view
23 -
download
2
description
Transcript of Ontology of Genetic Susceptibility Factors to Diabetes Mellitus (OGSF-DM)
Ontology of Genetic Ontology of Genetic Susceptibility Factors to Susceptibility Factors to
Diabetes Mellitus Diabetes Mellitus (OGSF-DM)(OGSF-DM)
Yu Lin, Norihiro Sakamoto Yu Lin, Norihiro Sakamoto
Department of Sociomedical Informatics, Department of Sociomedical Informatics,
Graduate School of Medicine, Kobe UniversityGraduate School of Medicine, Kobe University
2008/022008/02 InterOntology08InterOntology08 22
WhWh at are Genetic Susceptibility Factors at are Genetic Susceptibility Factors (GSF) ?(GSF) ?
How do we confirm genetic susceptibility ?How do we confirm genetic susceptibility ?Why do we need an ontology ?Why do we need an ontology ?The Ontology of Genetic Susceptibility The Ontology of Genetic Susceptibility
Factors to Diabetes MellitusFactors to Diabetes Mellitus (( OGSF-OGSF-DM)DM)MethodologyMethodologyTestingTesting
DiscussionDiscussion
AgendaAgenda
2008/022008/02 InterOntology08InterOntology08 33
Search “Genetic Susceptibility” in UMLSSearch “Genetic Susceptibility” in UMLS
2008/022008/02 InterOntology08InterOntology08 44
Scope of “GSF to Diabetes Mellitus”Scope of “GSF to Diabetes Mellitus”
Those Those genetic characteristic and interactiongenetic characteristic and interaction between genetic and environmental factors which between genetic and environmental factors which increase the probabilityincrease the probability to develop diabetes to develop diabetes mellitus (DM).mellitus (DM).polymorphismpolymorphism linked locilinked lociSNPSNPhaplotypehaplotypegenotypegenotype
If “decrease”, If “decrease”, then then
“resistence”“resistence”
2008/022008/02 InterOntology08InterOntology08 55
Mendelian Mendelian Diease Diease
VSVS
Complex Complex DiseaseDisease
Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease. Nature. 2005 Jun 2;435(7042):584-9. Review.
2008/022008/02 InterOntology08InterOntology08 66
How to confirm the GSFHow to confirm the GSF
Through combined Through combined family-based linkage studyfamily-based linkage study and and population-based association studypopulation-based association study
Through a combined Through a combined geneticgenetic (gene-by-gene (gene-by-gene function-candidate) association approach with a function-candidate) association approach with a genome-widegenome-wide association approach association approach
Through combined Through combined statisticalstatistical study with study with biologicalbiological function study function study
2008/022008/02 InterOntology08InterOntology08 77
Factors Affecting Statistical PowerFactors Affecting Statistical Power of Confirming GSFof Confirming GSF
Number of disease variantsNumber of disease variants Allele frequencies among populationAllele frequencies among population Effect size on disease phenotype Effect size on disease phenotype
Odds Ratio (OR)Odds Ratio (OR) Population structure and geographyPopulation structure and geography Selection biasSelection bias Genotype and phenotype misclassification errorsGenotype and phenotype misclassification errors
Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.
2008/022008/02 InterOntology08InterOntology08 88
No Criteria EstablishedNo Criteria Established
There are no established criteria for confirming There are no established criteria for confirming GSF (Genetic Susceptibility Factors)GSF (Genetic Susceptibility Factors) OR1.5-2.0 ? OR1.5-2.0 ? sample size sample size population population
Can we settle down this?Can we settle down this?
2008/022008/02 InterOntology08InterOntology08 99
A Knowledge Base is NeededA Knowledge Base is Needed
The primary idea is to catalog all GSF to Diabetes The primary idea is to catalog all GSF to Diabetes Mellitus (DM)Mellitus (DM)
The reality of researches on GSF to DMThe reality of researches on GSF to DM Different levels of genetic objectDifferent levels of genetic object Different types of study design Different types of study design Inconsistent resultInconsistent result Complex phenotypes of DMComplex phenotypes of DM
Versatile datasets demand a knowledge base on this Versatile datasets demand a knowledge base on this topictopic
2008/022008/02 InterOntology08InterOntology08 1010
Ontology in GeneralOntology in General
Originally from philosophyOriginally from philosophy An ontology is “specification of a shared conceptualization” [Gruber An ontology is “specification of a shared conceptualization” [Gruber
T.]T.] Ontology as an approach to “annotation of multiple bodies of Ontology as an approach to “annotation of multiple bodies of
data”[Smith B. et al]data”[Smith B. et al] Widely used in computer science and information scienceWidely used in computer science and information science
artificial intelligenceartificial intelligence the Semantic Webthe Semantic Web software engineeringsoftware engineering biomedical informatics biomedical informatics “Gene Ontology as a successful “Gene Ontology as a successful
example”example” library sciencelibrary science information architecture as a form of knowledge representationinformation architecture as a form of knowledge representationRef: http://en.wikipedia.org/wiki/Ontology_%28computer_science%29
2008/022008/02 InterOntology08InterOntology08 1111
Ontology is a Good ToolOntology is a Good Tool
In our case, ontology can help with:In our case, ontology can help with: Knowledge representationKnowledge representation Database designDatabase design Content-oriented analysisContent-oriented analysis Information retrieval and extractionInformation retrieval and extraction Information integrationInformation integration
By setting rules, can we establish a criteria to By setting rules, can we establish a criteria to demonstrate either the genetic susceptibility or demonstrate either the genetic susceptibility or causality to complex disease?causality to complex disease?
2008/022008/02 InterOntology08InterOntology08 1212
WhWh at are the Genetic Susceptibility at are the Genetic Susceptibility Factors (GSF)Factors (GSF)
How do we confirm genetic susceptibility How do we confirm genetic susceptibility Why do we need an ontologyWhy do we need an ontologyThe Ontology of Genetic Susceptibility The Ontology of Genetic Susceptibility
Factors to Diabetes MellitusFactors to Diabetes Mellitus (( OGSF-OGSF-DM)DM)MethodologyMethodologyTestingTesting
DiscussionDiscussion
AgendaAgenda
2008/022008/02 InterOntology08InterOntology08 1313
The Methodology of OGSF-DMThe Methodology of OGSF-DM
conceptualizationconceptualization
specificationspecification
integrationintegration
Implementation, Implementation, evaluationevaluation
Specify the domain and scopeSpecify the domain and scope
Build the conceptual modelBuild the conceptual model
Reuse and import other ontologiesReuse and import other ontologies
Protégé 3.3.1, OWL , SWRL rules Protégé 3.3.1, OWL , SWRL rules
2008/022008/02 InterOntology08InterOntology08 1414
Step1. SpecificationStep1. Specification
Domain: Represent the knowledge of GSF to DM and Domain: Represent the knowledge of GSF to DM and related phenotypesrelated phenotypes
Explore relevant literature resources:Explore relevant literature resources: PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007) Books: Books:
Joslin’s Joslin’s Diabetes MellitusDiabetes Mellitus Human Molecular GeneticsHuman Molecular Genetics 3 3
The most fundamental terms:The most fundamental terms: i) Human disease: diabetes mellitus and related disorders; i) Human disease: diabetes mellitus and related disorders; ii) Phenotypes and observed quantity parameters; ii) Phenotypes and observed quantity parameters; iii) Genetic concepts;iii) Genetic concepts; iv) Geographical regions; iv) Geographical regions; v) Disease gene study of the original paper.v) Disease gene study of the original paper.
2008/022008/02 InterOntology08InterOntology08 1515
Step2. ConceptualizationStep2. Conceptualization
The core conception generated by analyzing the titles The core conception generated by analyzing the titles of the corpus of the corpus
The conception shows an The conception shows an N-ary relationship N-ary relationship
2008/022008/02 InterOntology08InterOntology08 1616
The top-level of OGSF-DMThe top-level of OGSF-DM
Adopted terms from BFO (Basic Formal Ontology ): Adopted terms from BFO (Basic Formal Ontology ): ContinuantContinuant,,Occurrent, Independent_ContinuantOccurrent, Independent_Continuant, , Dependent_ContiuantDependent_Contiuant , , QualityQuality
2008/022008/02 InterOntology08InterOntology08 1717
The position of core conceptsThe position of core concepts
2008/022008/02 InterOntology08InterOntology08 1818
CLASS: CLASS: Observed_RelationshipObserved_Relationship
• Class hierarchyClass hierarchy • Constraints of classConstraints of class
2008/022008/02 InterOntology08InterOntology08 1919
The termThe term ‘Allele’ ‘Allele’ is polysemous is polysemous
Genetics definition: an allele is either one of a pair (or Genetics definition: an allele is either one of a pair (or series) of alternative forms of a gene that can occupy series) of alternative forms of a gene that can occupy the same locus on a particular chromosome, and that the same locus on a particular chromosome, and that control the same character of the phenotype. control the same character of the phenotype. (http://www.thefreedictionary.com/allele)(http://www.thefreedictionary.com/allele)
“ “Allele” appeared in different resources:Allele” appeared in different resources:
Meaning of AlleleMeaning of Allele Appeared FormAppeared Form ResourceResource
the variant of gene in the variant of gene in an individualan individual
disease “allele”disease “allele” original paperoriginal paper
representation of SNPrepresentation of SNP ““allele/allele” in DNA, RNA and allele/allele” in DNA, RNA and amino acid levelamino acid level
HGVBaseHGVBase
allele sharing in sibsallele sharing in sibs IBS,IBD “allele” IBS,IBD “allele” linkage studylinkage study
2008/022008/02 InterOntology08InterOntology08 2020
Allele Allele CLASS in OGSF-DMCLASS in OGSF-DM
An abstractionAn abstraction Currently, it satisfied the data modelCurrently, it satisfied the data model Need to be refined in the futureNeed to be refined in the future
2008/022008/02 InterOntology08InterOntology08 2121
GeneGene concept has evolved concept has evolved
1860s-1860s-1900s1900s
1910s1910s 1940s1940s 1950s1950s 1960s1960s 1970s-1970s-1980s1980s
1990s-1990s-2000s2000s
Gene asGene asa discrete a discrete
unit of unit of heredityheredity
Gene asGene asa distinct a distinct locuslocus
Gene asGene asa blueprint a blueprint
for a for a proteinprotein
Gene asGene asa physical a physical moleculemolecule
Gene asGene astranscribetranscribe
d coded code
Gene asGene asORF ORF
sequence sequence patternpattern
Gene asGene asannotated annotated genomic genomic
entityentity
2007-2007-
Gene asGene as……
Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.
2008/022008/02 InterOntology08InterOntology08 2222
Some definitions of ‘gene’Some definitions of ‘gene’
Human Genome Nomenclature Organization:“a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology”(Wain et al. 2002)
Rat Genome Database : : “the DNA sequence necessary and sufficient to express the complete complement of functional products derived from a unit of transcription ”(2003)
Sequence Ontology Consortium: “locatable region of genomic sequence,corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions” (Pearson 2006).
ENCODE project Consortium: “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.”(Gerstein et al.2008)
MeSH : genes are “Specific sequences of nucleotides along a molecule of DNA (or, in the case of some viruses, RNA) which represent functional units of HEREDITY. Most eukaryotic genes contain a set of coding regions (EXONS) that are spliced together in the transcript, after removal of intervening sequence (INTRONS) and are therefore labeled split genes. ”
2008/022008/02 InterOntology08InterOntology08 2323
GeneGene CLASS in OGSF-DMCLASS in OGSF-DM
A place holderA place holder The instance of The instance of GeneGene is the name of the gene which is the name of the gene which
appears in the research paperappears in the research paper
2008/022008/02 InterOntology08InterOntology08 2424
Step3. IntegrationStep3. Integration
Importing two ontologies:Importing two ontologies:ontology of glucose metabolism disorders ontology of glucose metabolism disorders
A slim OBO files was extracted from Human A slim OBO files was extracted from Human Disease ontology Disease ontology
OBO file was transfered to OWL file OBO file was transfered to OWL file The class hierarchy was restructure new terms The class hierarchy was restructure new terms
from “Joslin’s Diabetes Mellitus” addedfrom “Joslin’s Diabetes Mellitus” added
ontology of geographical regions ontology of geographical regions Generated by hand adopting the terms from Generated by hand adopting the terms from
MeSH2008 “Geographic Locations[z01]”MeSH2008 “Geographic Locations[z01]”
2008/022008/02 InterOntology08InterOntology08 2525
Step4. Implementation and EvaluationStep4. Implementation and Evaluation
ProtProtééggé_é_3.3.1 + OWL3.3.1 + OWL
SWRL rule example:SWRL rule example: hasPopulation-1 RulehasPopulation-1 Rule
isObservedIn (?x, ?y) hasStudyPopulation(?y, ?z) ∧isObservedIn (?x, ?y) hasStudyPopulation(?y, ?z) ∧
→ → hasPopulation(?x, ?z)hasPopulation(?x, ?z)
to infer the population(z) of the to infer the population(z) of the Obeserved_RelationshipObeserved_Relationship(x) ; y is a (x) ; y is a Disease_Gene_Study.Disease_Gene_Study.
2008/022008/02 InterOntology08InterOntology08 2626
The example articleThe example article
Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134Full text URL: http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134
2008/022008/02 InterOntology08InterOntology08 2727
Asserting individual 1)Asserting individual 1)
1) associated_with_1 ⊆1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_AssociationNot_Stated_Resistance_or_Susceptibility_Association
⋂∀ ⋂∀ hasSupportingEvidence ( {odds_ratio_OR_1.49 } )∋hasSupportingEvidence ( {odds_ratio_OR_1.49 } )∋
⋂∃ ⋂∃ isObservedIn ( {Disease_Genetic_Study_15047632})∋isObservedIn ( {Disease_Genetic_Study_15047632})∋
⋂∃ ⋂∃ isObservedRelationshipOf ( {a_3_intronic_SNP_rs3818247})∋isObservedRelationshipOf ( {a_3_intronic_SNP_rs3818247})∋
⋂∃ ⋂∃ isRelationshipWith ( {Type_2_Diabetes_})∋isRelationshipWith ( {Type_2_Diabetes_})∋
means that a 3’ intronic SNP rs3818247 is means that a 3’ intronic SNP rs3818247 is associated with Type 2 Diabetes with a associated with Type 2 Diabetes with a supporting evidence of OR 1.49. The supporting evidence of OR 1.49. The relationship is an associated relationship, but is relationship is an associated relationship, but is stated to be neither a susceptibility nor a stated to be neither a susceptibility nor a resistance factor in this study. resistance factor in this study.
2008/022008/02 InterOntology08InterOntology08 2828
Asserting individual 2),3),4)Asserting individual 2),3),4)
2) odds_ratio_OR_1.49 ⊆2) odds_ratio_OR_1.49 ⊆ Odds_RatioOdds_Ratio ⋂∀ ⋂∀ hasOR ( {1.49} )∋hasOR ( {1.49} )∋ ⋂∀ ⋂∀ hasCI95 ( {1.15-1.90} )∋hasCI95 ( {1.15-1.90} )∋ ⋂∃ ⋂∃ hasP ( {Corrected_P_0.0252} {Uncorrected_P_0.0028} )∋ ⋂hasP ( {Corrected_P_0.0252} {Uncorrected_P_0.0028} )∋ ⋂ ⋂∃ ⋂∃ hasClassifiedGroup ( {Control_Group_1} {Case_Group_1} )∋ ⋂hasClassifiedGroup ( {Control_Group_1} {Case_Group_1} )∋ ⋂ 3) Control_Group_1 ⊆3) Control_Group_1 ⊆ Classified_GroupClassified_Group ⋂∃ ⋂∃hasPopulationSize ( {342 int})∋hasPopulationSize ( {342 int})∋ ⋂∀ ⋂∀isPartOf ( {an_ashkenazi_jewish_population})∋isPartOf ( {an_ashkenazi_jewish_population})∋ 4) Case_Group_1 ⊆4) Case_Group_1 ⊆ Classified_GroupClassified_Group ⋂∃ ⋂∃hasPopulationSize ( {275 int})∋hasPopulationSize ( {275 int})∋ ⋂∀ ⋂∀isPartOf ( {an_ashkenazi_jewish_population})∋isPartOf ( {an_ashkenazi_jewish_population})∋
2), 3) and 4) together means that the study conducted a case-2), 3) and 4) together means that the study conducted a case-control study(case size =275 and control size = 342) in an control study(case size =275 and control size = 342) in an Ashkenazai Jewish population.Ashkenazai Jewish population.Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected PP = 0.0252, = 0.0252,
uncorrected uncorrected P P = 0.0028). = 0.0028).
2008/022008/02 InterOntology08InterOntology08 2929
Asserting individual 5)Asserting individual 5)
5) Disease_Gene_Study_15047632 ⊆5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_StudyDisease_Gene_Study ⋂∀ ⋂∀ hasPubMedID ( {PMID_15047632}∋hasPubMedID ( {PMID_15047632}∋ ⋂∃ ⋂∃ hasStudyPopulation ( {an_ashkenazi_jewish_population})∋hasStudyPopulation ( {an_ashkenazi_jewish_population})∋ ⋂∀ ⋂∀ hasURI ( {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})∋hasURI ( {http://diabetes.diabetesjournals.org/cgi/content/full/53/4/1134})∋6) an_ashkenazi_jewish_population ⊆6) an_ashkenazi_jewish_population ⊆ Population_GroupPopulation_Group ⋂∃⋂∃hasPopulationCharacteristic ( {Jews} )∋hasPopulationCharacteristic ( {Jews} )∋ ⋂∃ ⋂∃hasGeographicalSite (( {Israel} {U.S.} )∋ ⋂hasGeographicalSite (( {Israel} {U.S.} )∋ ⋂
5) and 6) means :5) and 6) means :①① An Ashkenazi Jewish population was investigated in this An Ashkenazi Jewish population was investigated in this
study; study; ②② The population belongs to Jews ethinic group and The population belongs to Jews ethinic group and
located in Israel and U.S. ;located in Israel and U.S. ;③③ the PubMedID and URL of this paper were collected.the PubMedID and URL of this paper were collected.
2008/022008/02 InterOntology08InterOntology08 3030
The core conceptionThe core conception
Put 1)-5) together, the core conception of Put 1)-5) together, the core conception of this one relationship is built: this one relationship is built:
relationships {relationships { associated associated } between the { } between the { 3_intronic_SNP_rs3818247 3_intronic_SNP_rs3818247} and } and
{{Type_2_DiabetesType_2_Diabetes} observed in a {} observed in a { an_ashkenazi_jewish_population an_ashkenazi_jewish_population } from a study } from a study {{ PMID_15047632 PMID_15047632}}. .
2008/022008/02 InterOntology08InterOntology08 3131
Representation of a SNPRepresentation of a SNP
a_3_intronic_SNP_rs3818247 ⊆a_3_intronic_SNP_rs3818247 ⊆ htSNPhtSNP ⋂∃ ⋂∃ hasAlleleComponent ( {DNA_Level_Allele_T} { DNA_Level_Allele_G})∋ ⋂hasAlleleComponent ( {DNA_Level_Allele_T} { DNA_Level_Allele_G})∋ ⋂ ⋂∃ ⋂∃ hasGenomeSite ( {flanking_3_intronic})∋hasGenomeSite ( {flanking_3_intronic})∋ ⋂∃ ⋂∃ isGeneticVariantOf ( {hepatocyte_nuclear_factor-4_alpha})∋isGeneticVariantOf ( {hepatocyte_nuclear_factor-4_alpha})∋ ⋂∃ ⋂∃ hasVariantDatabase ( {HGVBase_SNP002310533} {dbSNP_rs3818247})∋ ⋂hasVariantDatabase ( {HGVBase_SNP002310533} {dbSNP_rs3818247})∋ ⋂
This means that the 3’ intronic SNP rs3818247 is a htSNP of This means that the 3’ intronic SNP rs3818247 is a htSNP of hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic sequence of the gene. The alleles of this SNP are T/G in DNA level. sequence of the gene. The alleles of this SNP are T/G in DNA level. Reference databases entry : Reference databases entry : 1) HGVBase : “SNP002310533” 1) HGVBase : “SNP002310533”
2) dbSNP : “rs3818247”2) dbSNP : “rs3818247”
2008/022008/02 InterOntology08InterOntology08 3232
DiscussionDiscussion
A hybrid of middle-out and top-down approach was A hybrid of middle-out and top-down approach was conducted to build our ontology.conducted to build our ontology.
BFO is important for harmonizing the domain ontologies BFO is important for harmonizing the domain ontologies in our case.in our case.
The ontology can apply to other complex diseases too.The ontology can apply to other complex diseases too. We anticipate the further application of this ontology:We anticipate the further application of this ontology:
Information retrievalInformation retrieval Knowledge base developmentKnowledge base development Logic rules establishingLogic rules establishing Mapping or link to other ontologies, such as GO, Mammalian Mapping or link to other ontologies, such as GO, Mammalian
Phenotype, and so on. Phenotype, and so on.