Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards...

1
Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards establishment of PID E-clinical decision support system) Phenotype ontology database PID Phenotype KnowledgeBase Search and Query interface - "PhenomeR" OWL, RDF files generation No Locality principle PID quality check by semi-automated method Yes Consistency principle Conservativi ty principle PID quality check by Logic based assessment method Mapped terms using Standard sources Human Disease (DOID) Human Phenotype Ontology (HPO) Online Mendelian Inheritance in Man - Metathesaurus source processing (OMIM-MTHU) Symptom Ontology (SYMP) Systematized Nomenclature of Medicine Clinical Terms (SNOMEDCT) The Unified Medical Language System - Concept Unique Identifiers (UMLS_CUI) Collected PID Phenotype s terms Phenotype annotation tool RAPID, IDR and Literatu re Is Mapped ? CONCLUSION Overall, this kind of analysis should bridge a gap between genotype and phenotype correlation thereby improving phenotype- based genetic analysis of PID genes. Moreover, it should facilitate clinicians in confirming early PID diagnosis and also helpful in implementing proper therapeutic interventions. We sincerely believe that the presented structured data format in RPO should help in augmenting biomedical researchers to do further analysis computationally and also assisting clinicians in identification of diagnosed PID ABSTRACT The main challenge for in silico genotype-phenotype correlation for any genetic diseases is to standardize phenotype ontology terms and the genotype data. Earlier, we have developed and established a molecular disease database named RAPID—Resource of Asian Primary Immunodeficiency Diseases (PID) (http://rapid.rcai.riken.jp), a web-based informatics platform which enables PID experts to easily mine collected genomic, transcriptomic, and proteomic data of PID causing genes. At present, RAPID comprises a total of 265 PIDs and 243 genes, out of which 233 genes are reported with over 5000 unique disease-causing mutations annotated from about 1800 PubMed citations as of February 2013. We, hereby, introduce a newly developed PID ontology browser, “PhenomeR” (http://rapid.rcai.riken.jp/ontology/v1.0/phenomer.php ), for systematic integration and analysis of PID phenotype with the genotype data that are taken from RAPID. It currently holds 1438 PID-phenotype terms that are mapped and standardized using logic based assessment approach and represented in the form of Web Ontology Language (OWL) and Resource Description Framework (RDF) formats using semantic web technology for easy data exchange and validation, and interpretation of PID phenotype-genotype correlation using various computational approaches. The motivation for the development of PhenomeR is mainly to assist researchers and clinicians to identify reported and novel PID-causing genes as well as to determine genes involved in PID through the identification of reported disease-causing mutations and their respective observed symptoms. In essence, PID PhenomeR serves as an active integrated platform for PID phenotype data, wherein the generated semantic framework is implemented in the integrated knowledge-base query interface i.e. SPARQL Protocol and RDF Query Language (SPARQL) endpoint for establishing a well-informed PID e-clinical decision support system. Successful outcome and challenges PhenomeR aims to build hierarchical ontology class structures and entities of all observed PID phenotypic terms that can be further used as integrated knowledgebase query interface - SPARQL Protocol and RDF Query Language (SPARQL) for screening and implementing algorithms to compile data from multiple sources to measure statistically significant dataset with greater sensitivity, specificity and degree of confidence towards well-informed clinical decision support system. The mapping of unmapped terms from the PhenomeR is a challenging task, since some of them are not available in any of the databases. This ongoing pursuit will soon implement a systematic integrated approach for mapping all these unmapped new terms towards an open community-driven semantic web (SW) technology. PhenomeR enables easy access, search, query and analyze PID phenotype terms associated with genes, diseases and mutations Masuya, H., Y. Makita, et al. (2011). "The RIKEN integrated database of mammals." Nucleic Acids Res. 39:D861-70. Acknowledgements The authors acknowledge RIKEN for providing necessary computing resources, the research team at the Institute of Bioinformatics (IOB), Bangalore India for their collaboration in developing RAPID, and alumni of our lab as well as all PID physicians involved in the PID Japan project for their valuable input and suggestions. Collaboration and funding The PID project has been initiated by the IOB and the Immunogenomics research group at Research Centre for Allergy and Immunology (RCAI), RIKEN Yokohama Institute, Japan and it was funded by The Asia S&T Strategic Cooperation Promotion Program, Special Coordination Funds for Promoting Science and Technology, MEXT, Japan. Overview of PID-phenomeR Contact: [email protected] Search result of PID phenotype term with category ‘Cardiovascular’ Subazini Thankaswamy Kosalai and Sujatha Mohan 1 ch Unit for Immunoinformatics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230- Statistics RPO summary page in NCBO BioPortal Registration form for submitting new PID terms (A) DATA COLLECTION (B) DATA STANDARDIZATION (C) DATA STORAGE & RETRIEVAL Database Statistics OWL Statistics Phenotype terms 1466 Classes 1549 Semantic types 24 Individuals - Category 29 Classes with single subclass 144 Subcategory 45 Classes with more than 25 subclasses 1346 Terms in Multiple Category 17 Average number of Siblings 276 Terms in Multiple subcategory 10 Object Property 161 Newly mapped terms 51 Data Property 9 Home page PID PhenomeR Database Schema R E S P O N S E Q U E R Y Reported list of genes Reported list of mutation data Primary information page of STK4 gene in RAPID Mutation analysis of STK4 gene Multiple terms search output Hyperlinked PubMed reference citation Term C3 deficiency viewed using Protégé 4.1 OntoGraf RDF file generated using OWL Syntax Converter Master list of PID phenotype terms, associated features and relationships in Excel format PID PhenomeR – Download Option (http:// bioportal.bioontology.org/ ontologies/3114) Search result of phenotype term Search result of phenotype term beginning with ‘Recurrent’ Term hierarchy visualization using NCBO widget from NCI thesaurus PID PhenomeR Advanced search options Reported list of mutation data All distinct subjects from RPO ontology queried using SPARQL http://bioportal.bioontology.org/projects/171 PID PhenomeR project in NCBO BioPortal PID PhenomeR – Download Option – OWL format RAPID - Home page Search result of PID phenotype term with semantic type - ‘Acquired Abnormality’ PID-phenomeR features Presents a web-based user friendly interface for accessing, querying browsing and analyzing PID phenotype terms Integrates semantically standardized phenotype vocabularies from RAPID along with PIDs, genes and disease-causing mutations into a relational ontology for inference of genotype-phenotype correlation Provides PID-phenotype data in various standardized downloadable options - OWL, RDF and Excel formats for easy sharing and data exchange among other interested research groups Displays the phenotype terms in tree structure using NCBO widget Facilitates integrated knowledgeBase query interface - SPARQL Protocol and RDF Query Language (SPARQL) Promotes a network of active open community- driven semantic web technology Subazini Thankaswamy Kosalai and Sujatha Mohan. PID PhenomeR- An integrated platform for developing phenotype ontology structures for primary immunodeficiency diseases (Database, Oxford University Press - In communication) Publications – PID project No No Yes Yes No RDF and OWL formats viewed in Link Data and Protégé

Transcript of Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards...

Page 1: Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards establishment of PID E-clinical decision support system)

Primary Immunodeficiency Disease (PID) PhenomeR(An integrated web-based ontology resource towards establishment of PID E-clinical decision support system)

Phenotype ontology database

Phenotype ontology database

PID Phenotype KnowledgeBase Search and Query interface -

"PhenomeR"

PID Phenotype KnowledgeBase Search and Query interface -

"PhenomeR"

OWL, RDF files generation

OWL, RDF files generation

NoNo

Locality principle

Locality principle PID quality check by semi-

automated method

PID quality check by semi-automated method

YesYes

Consistency principle

Consistency principle

Conservativityprinciple

Conservativityprinciple

PID quality check by Logic based

assessment method

PID quality check by Logic based

assessment method

Mapped terms using Standard sourcesHuman Disease (DOID)

Human Phenotype Ontology (HPO)Online Mendelian Inheritance in Man -

Metathesaurus source processing (OMIM-MTHU)Symptom Ontology (SYMP)

Systematized Nomenclature of Medicine Clinical Terms (SNOMEDCT)

The Unified Medical Language System - Concept Unique Identifiers (UMLS_CUI)

Mapped terms using Standard sourcesHuman Disease (DOID)

Human Phenotype Ontology (HPO)Online Mendelian Inheritance in Man -

Metathesaurus source processing (OMIM-MTHU)Symptom Ontology (SYMP)

Systematized Nomenclature of Medicine Clinical Terms (SNOMEDCT)

The Unified Medical Language System - Concept Unique Identifiers (UMLS_CUI)

Collected PID Phenotypes

terms

Collected PID Phenotypes

terms

Phenotype annotation tool

Phenotype annotation tool

RAPID, IDR and Literature

RAPID, IDR and Literature

Is Mapped ?Is Mapped ?

CONCLUSION

Overall, this kind of analysis should bridge a gap between genotype and phenotype correlation thereby improving phenotype-based genetic analysis of PID genes. Moreover, it should facilitate clinicians in confirming early PID diagnosis and also helpful in implementing proper therapeutic interventions.

We sincerely believe that the presented structured data format in RPO should help in augmenting biomedical researchers to do further analysis computationally and also assisting clinicians in identification of diagnosed PID

ABSTRACTThe main challenge for in silico genotype-phenotype correlation for any genetic diseases is to standardize phenotype ontology terms and the genotype data. Earlier, we have developed and established a molecular disease database named RAPID—Resource of Asian Primary Immunodeficiency Diseases (PID) (http://rapid.rcai.riken.jp), a web-based informatics platform which enables PID experts to easily mine collected genomic, transcriptomic, and proteomic data of PID causing genes. At present, RAPID comprises a total of 265 PIDs and 243 genes, out of which 233 genes are reported with over 5000 unique disease-causing mutations annotated from about 1800 PubMed citations as of February 2013. We, hereby, introduce a newly developed PID ontology browser, “PhenomeR” (http://rapid.rcai.riken.jp/ontology/v1.0/phenomer.php), for systematic integration and analysis of PID phenotype with the genotype data that are taken from RAPID. It currently holds 1438 PID-phenotype terms that are mapped and standardized using logic based assessment approach and represented in the form of Web Ontology Language (OWL) and Resource Description Framework (RDF) formats using semantic web technology for easy data exchange and validation, and interpretation of PID phenotype-genotype correlation using various computational approaches. The motivation for the development of PhenomeR is mainly to assist researchers and clinicians to identify reported and novel PID-causing genes as well as to determine genes involved in PID through the identification of reported disease-causing mutations and their respective observed symptoms. In essence, PID PhenomeR serves as an active integrated platform for PID phenotype data, wherein the generated semantic framework is implemented in the integrated knowledge-base query interface i.e. SPARQL Protocol and RDF Query Language (SPARQL) endpoint for establishing a well-informed PID e-clinical decision support system.

Successful outcome and challenges

PhenomeR aims to build hierarchical ontology class structures and entities of all observed PID phenotypic terms that can be further used as integrated knowledgebase query interface - SPARQL Protocol and RDF Query Language (SPARQL) for screening and implementing algorithms to compile data from multiple sources to measure statistically significant dataset with greater sensitivity, specificity and degree of confidence towards well-informed clinical decision support system.

The mapping of unmapped terms from the PhenomeR is a challenging task, since some of them are not available in any of the databases. This ongoing pursuit will soon implement a systematic integrated approach for mapping all these unmapped new terms towards an open community-driven semantic web (SW) technology.PhenomeR enables easy access, search, query and analyze PID phenotype terms associated with genes, diseases and mutations

Masuya, H., Y. Makita, et al. (2011). "The RIKEN integrated database of mammals." Nucleic Acids Res. 39:D861-70.

AcknowledgementsThe authors acknowledge RIKEN for providing necessary computing resources, the research team at the Institute of Bioinformatics (IOB), Bangalore India for their collaboration in developing RAPID, and alumni of our lab as well as all PID physicians involved in the PID Japan project for their valuable input and suggestions. Collaboration and fundingThe PID project has been initiated by the IOB and the Immunogenomics research group at Research Centre for Allergy and Immunology (RCAI), RIKEN Yokohama Institute, Japan and it was funded by The Asia S&T Strategic Cooperation Promotion Program, Special Coordination Funds for Promoting Science and Technology, MEXT, Japan.

Overview of PID-phenomeR

Contact: [email protected]

Search result of PID phenotype term with category

‘Cardiovascular’

Subazini Thankaswamy Kosalai and Sujatha Mohan1

1Research Unit for Immunoinformatics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.

Statistics

RPO summary page in NCBO BioPortal

Registration form for submitting new PID terms

(A) DATA COLLECTION

(B) DATA STANDARDIZATION

(C) DATA STORAGE & RETRIEVAL

Database Statistics OWL Statistics

Phenotype terms   1466  Classes   1549

 Semantic types       24  Individuals     -

 Category       29  Classes with single subclass     144

 Subcategory       45 Classes with more than 25

subclasses  1346

 Terms in Multiple Category

      17  Average number of Siblings     276

 Terms in Multiple subcategory

      10  Object Property     161

 Newly mapped terms       51  Data Property         9

Home pagePID PhenomeR Database Schema

RESPONSE

QUERY

Reported list of genes

Reported list of mutation data

Primary information page of STK4 gene in RAPID

Mutation analysis of STK4 gene

Multiple terms search output

Hyperlinked PubMed reference citation

Term C3 deficiency viewed using

Protégé 4.1 OntoGraf

RDF file generated using OWL Syntax Converter

Master list of PID phenotype terms, associated features and relationships in Excel format

PID PhenomeR – Download Option

(http://bioportal.bioontology.org/ontologies/3114)

Search result of phenotype term

Search result of phenotype term beginning with ‘Recurrent’

Term hierarchy visualization using NCBO

widget from NCI thesaurus

PID PhenomeR Advanced search options

Reported list of mutation data

All distinct subjects from RPO ontology queried

using SPARQL

http://bioportal.bioontology.org/projects/171

PID PhenomeR project in NCBO BioPortal

PID PhenomeR – Download Option – OWL format

RAPID - Home page

Search result of PID phenotype term with semantic type - ‘Acquired

Abnormality’

PID-phenomeR features Presents a web-based user friendly interface for

accessing, querying browsing and analyzing PID phenotype terms

Integrates semantically standardized phenotype vocabularies from RAPID along with PIDs, genes and disease-causing mutations into a relational ontology for inference of genotype-phenotype correlation

Provides PID-phenotype data in various standardized downloadable options - OWL, RDF and Excel formats for easy sharing and data exchange among other interested research groups

Displays the phenotype terms in tree structure using NCBO widget

Facilitates integrated knowledgeBase query interface - SPARQL Protocol and RDF Query Language (SPARQL)

Promotes a network of active open community-driven semantic web technology

Subazini Thankaswamy Kosalai and Sujatha Mohan. PID PhenomeR- An integrated platform for developing phenotype ontology structures for primary immunodeficiency diseases (Database, Oxford University Press - In communication)

Publications – PID project

NoNo

NoNo

YesYes

YesYes

NoNo

RDF and OWL formats viewed in Link Data and Protégé