Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability
-
Upload
amit-sheth -
Category
Education
-
view
2.479 -
download
2
description
Transcript of Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability
Ora Lassila • Principal Architect (Nokia
Mobile Solutions); also an advisor to Nokia’s top mgmt
• Elected member of W3C’s Advisory Board since 1998
• Earlier: Research Fellow (Nokia Research), W3C Fellow (MIT), Project Manager (CMU), entrepreneur, etc.
• Ph.D from Helsinki University of Technology (CS)
• http://www.lassila.org/
Amit Sheth • LexisNexis Ohio Eminent
Scholar, Director, Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University
• Educator, researcher, entrepreneur – 2 companies, products, deployed apps, W3C and biomedical community standards
• Earlier: UGA, Telcordia, Unisys, Honeywell
• http://knoesis.org/amit
• Semantic Web • some background
Ora
• Semantic Web in use • examples of applications in traditional clinical care to translational medicine
Amit
• Challenges (and promise) • what makes this difficult • why do we want to pursue it anyway
Ora (technical) Amit (health)
• Often characterized as the “next generation of the World Wide Web” • Web content amenable to automation • (current content intended for humans…)
• Often characterized as the “next generation of the World Wide Web” • Web content amenable to automation • (current content intended for humans…)
• In reality, the Semantic Web is a vision of the future of (personal) computing • machines working on behalf of their human users • more autonomy, handling of unanticipated situations
• Heavy reliance of knowledge representation & reasoning • also multi-agent systems, other AI-based technologies
• At the core, the Semantic Web is about • describing things (objects, concepts, services, …) • querying the descriptions • reasoning about the descriptions
• As such, it is knowledge representation • for the Web • (or KR using standardized Web technologies)
• (in comparison, the “old Web” was really about documents and finding them…)
• Motivated by the need for automation • automation requires interoperability (via standards) • heavy process, high up-front investment • (alternative: hand-crafted but “brittle” programs…)
• Interoperability achieved by exposing meaning • accessible semantics • note: interoperability of any two systems can be
achieved via engineering, but this does not scale
• Automation → autonomy • prevailing paradigm: agent-based systems • implies reasoning, planning, interoperable
representations of knowledge
• Contrary to “Web 2.0”, Semantic Web aims at achieving many things “ad hoc” • e.g., ad hoc mash-ups by non-computer savvy people
• Shared (and accessible) semantics is the key to interoperability • Semantic Web introduces a fundamentally different approach to standardization • standardize how to say things and not what to say • ontological techniques allow “delayed semantic
commitment”
• Semantic Web is built in a layered manner • Not everybody needs all the layers
Encoding characters : Unicode
Encoding structure: XML
Uniform metamodel: RDF + URI
Simple data models & taxonomies: RDF Schema
Rich ontologies: OWL
Queries: SPARQL, Rules: RIF
…
Semantic Web
• Achieve for data what Web did to documents • Relationship with the original Semantic Web vision: no AI, no agents, no autonomy • Interoperability is still very important • interoperability of formats • interoperability of semantics
• Enables interchange of large data sets • (thus very useful in, say, collaborative research)
• Semantic Web vision is largely predicated on the availability of data • Linked Data is a movement that gets us there
Web of pages - text, manually created links - extensive navigation
2007
1997 Web of databases - dynamically generated pages - web query interfaces
Web of resources - data, service, data, mashups
Web of people - social networks, user-created casual content
Keywords
Patterns
Objects
Situations, Events
Tech assimilated in life
Web 1.0
Web 2.0
Web 3.0
Web of Sensors, Devices/IoT - 40 billion sensors, 5 billion mobile connections
Medical Informatics Bioinformatics
Etiology Pathogenesis Clinical findings Diagnosis Prognosis Treatment
Genome Transcriptome
Proteome Metabolome
Physiome ...ome
Genbank
Uniprot
Pubmed
Clinical Trials.gov
...needs a connection
Biomedical Informatics
Hypothesis Validation Experiment design Predictions Personalized medicine
More advanced capabilities for search, integration, analysis, linking to new insights and discoveries!
text
Health Information Services
Elsevier iConsult
Scientific Literature
PubMed 300 Documents Published Online each day
User-contributed Content (Informal) Experts: GeneRifs WikiGene
Consumer: Blogs Social Networks
NCBI Public Datasets
Genome, Protein DBs new sequences daily
Laboratory Data
Lab tests, RTPCR, Mass spec
Clinical Data
Personal health history
Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
• W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/ • Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability • National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
• Status: In use continuously since 01/2006 • Where: Athens Heart Center & its partners and labs • What: Use of semantic Web technologies for clinical decision support
Examples demonstrating use of Semantic Web for Health Care and Life Sciences research projects and operational clinical or research applications
Details: http://knoesis.org/library/resource.php?id=00004
Annotate ICD9s Annotate Doctors
Lexical Annotation
Level 3 Drug Interaction
Insurance Formulary
Drug Allergy Demo at: http://knoesis.org/library/demos/
owl:thing
prescription_drug_ brand_name
brandname_undeclared
brandname_composite
prescription_drug
monograph_ix_class
cpnum_ group
prescription_drug_ property
indication_ property
formulary_ property
non_drug_ reactant
interaction_property
property
formulary
brandname_individual
interaction_with_prescription_drug
interaction
indication
generic_ individual
prescription_drug_ generic
generic_ composite
interaction_ with_non_ drug_reactant
interaction_with_monograph_ix_class
• Status: Completed research • Where: NIH • What: queries across integrated data sources • Enriching data with ontologies for integration, querying,
and automation • Ontologies beyond vocabularies: the power of
relationships
gene
GO
PubMed
Gene name
OMIM
Sequence
Interactions Glycosyltransferase
Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 http://knoesis.org/library/resource.php?id=00014
Congenital muscular dystrophy, type 1D
(GeneID: 9215)
has_associated_disease
has_molecular_function
Acetylglucosaminyl-transferase activity
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
MIM:608840 Muscular dystrophy, congenital, type 1D
GO:0008375
has_associated_phenotype
has_molecular_function
EG:9215 LARGE
acetylglucosaminyl- transferase
GO:0016757 glycosyltransferase
GO:0008194 isa
GO:0008375 acetylglucosaminyl- transferase
GO:0016758
From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . ?g has molecular function ?t . ?g has_associated_phenotype ?b2 . ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) }
• Status: Completed research • Where: NIH • What: Understanding the genetic basis of nicotine dependence. Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. • How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).
• NIDA study on nicotine dependency • List of candidate genes in humans • Analysis objectives include:
o Find interactions between genes o Identification of active genes – maximum number of
pathways o Identification of genes based on anatomical locations
• Requires integration of genome and biological pathway information
Entrez Gene
Reactome KEGG
HumanCyc
GeneOntology HomoloGene
Genome and pathway information integration
• pathway
• protein
• pmid
• pathway
• protein
• pmid • pathway
• protein
• pmid
• GO ID
• HomoloGene ID
http://knoesis.org/library/resource.php?id=00221
BioPAX ontology
Entrez Knowledge Model (EKoM)
• Status: Research prototype – in regular lab use • Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA • What: Semantics and Services Enabled Problem Solving Environment for Trypanosoma cruzi • Who: Kno.e.sis, UGA, NCBO
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University
Tarleton Research Group, Center for Tropical and Emerging Global Diseases(CTEGD), University of Georgia
Large Scale Distributed Information Systems (LSDIS). University of Georgia
National Center for Biological Ontologies (NCBO), Stanford University
The Wellcome Trust Sanger Institute, Cambridge, UK
The Oswaldo Cruz Institute (Fiocruz), Brazil
• T. cruzi is a protozoan parasite that causes Chagas Disease or American trypanosomiasis • Chagas disease is the leading cause of death in Latin America where around 18 million people are infected with this parasite • Related parasites include, Trypanosoma brucei and Leishmania major that causes African trypanosomiasis and leishmaniasis, respectively.
T. Brucei surrounded by red blood cells in a smear of infected blood. (Copyright: Jürgen Berger and Dr. Peter Overath, Max Planck Institute for Developmental Biology, Tübengen)
Trykipedia - a Wiki-based platform for collaboration of Parasite Research Community
• Data Resources Internal lab data (from Tarleton Research Group)
Gene Knockout, Strain Creation, Microarray, and Proteome External databases (TriTrypDB, ProtozoaDB, Drug Bank, etc. )
• Ontologies Parasite Lifecycle Ontology (PLO) Parasite Experiment Ontology (PEO)
• PKR supports complex biological queries related to T.cruzi drugs, vaccination, or gene knockout targets; for example, Find all genes with proteomic expression in mammalian lifecycle stage with GPI anchor
or signal peptide predictions.
Find genes annotated as potential vaccine candidates.
Find all genes with proteomic expression evidence in the mammalian host lifecycle
stages for T. cruzi
*T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Sequence Extraction
Plasmid Construction
Transfection
Drug Selection
Cell Cloning
Gene Name
3‘ & 5’ Region
Knockout Construct Plasmid
Drug Resistant Plasmid
Transfected Sample
Selected Sample
Cloned Sample
T.Cruzi sample
Cloned Sample
Gene Name
?
Gene Knockout and Strain Creation*
Related Queries from Biologists • List all groups in the lab that used
a Target Region Plasmid? • Which researcher created a new
strain of the parasite (with ID = 66)? • An experiment was not successful
– has this experiment been conducted earlier? What were the results?
Complex queries can also include: - on-the-fly Web services execution to retrieve additional data - inference rules to make implicit knowledge explicit
1. Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of OxyContin®
2. Describe temporal patterns of non-medical use of OxyContin® tablets as discussed on Web-based forums
3. Collaboration between Kno.e.sis and CITAR (Center for Interventions, Treatment and Addictions Research) at Wright State Univ.
• Volatile nature of execution environments • May have an impact on multiple activities/ tasks in the
workflow • HF Pathway • New information about diseases, drugs becomes
available • Affects treatment plans, drug-drug interactions
• Need to incorporate the new knowledge into execution • capture the constraints and relationships between
different tasks activities
New knowledge about treatment found during the execution of the pathway
New knowledge about drugs, drug drug interactions
Diabetes mellitus adversely affects the outcomes in patients with myocardial infarction (MI), due in part to the exacerbation of left ventricular (LV) remodeling. Although angiotensin II type 1 receptor blocker (ARB) has been demonstrated to be effective in the treatment of heart failure, information about the potential benefits of ARB on advanced LV failure associated with diabetes is lacking. To induce diabetes, male mice were injected intraperitoneally with streptozotocin (200 mg/kg). At 2 weeks, anterior MI was created by ligating the left coronary artery. These animals received treatment with olmesartan (0.1 mg/kg/day; n = 50) or vehicle (n = 51) for 4 weeks. Diabetes worsened the survival and exaggerated echocardiographic LV dilatation and dysfunction in MI. Treatment of diabetic MI mice with olmesartan significantly improved the survival rate (42% versus 27%, P < 0.05) without affecting blood glucose, arterial blood pressure, or infarct size. It also attenuated LV dysfunction in diabetic MI. Likewise, olmesartan attenuated myocyte hypertrophy, interstitial fibrosis, and the number of apoptotic cells in the noninfarcted LV from diabetic MI. Post-MI LV remodeling and failure in diabetes were ameliorated by ARB, providing further evidence that angiotensin II plays a pivotal role in the exacerbated heart failure after diabetic MI.
ARB possibly plays role in heart failure
Angiotensin II type 1 receptor blocker attenuates exacerbated left ventricular remodeling and failure in diabetes-associated myocardial infarction., Matsusaka H, et. al.
possibly plays role in
Disease
Angiotension Receptor Blocker (ARB)
Ontology: A Framework for Schema-Driven Relationship Discovery from Unstructured Text, Ramakrishnan, et. al., ISWC 2006, LNCS 4273, pp. 583-596
• Matching medical requirements with availability of medical resources (Mumbai, India) • Project HERO Helpline for Emergency Response Operations
• For patients seeking for immediate medical help
• Medical awareness in rural India • mMitra, info. service during pregnancy and childhood
emergency
Information bridge
Medical Emergency
Medical Resourc
es
• Any specific problem (typically) has a specific solution that does not require Semantic Web technologies • Q: Why then is the Semantic Web attractive? A: For future-proofing
Semantic Web can be a solution to those problems and situations that
we are yet to define
• Cultural resistance (“this smacks of AI…”) • Unfamiliar technology (e.g., reasoning) • Often implies complex representational models • procedural programs vs. declarative data
• Unclear business models • Also, actual technical challenges • scalability of query processing • complexity (and thus scalability) of reasoning • scalability of access control • …
• (merely an observation of what you may encounter…)
• What makes Semantic Web attractive and worth pursuing is…
Sou
rce:
Min
dlab
, U o
f Mar
ylan
d
• Serendipity in interoperability • can we interoperate with systems, devices and/or
services we knew nothing about at design time?
• Serendipity in information reuse • with accessible semantics, this becomes easier…
• Serendipity in information integration • can information from independent sources be combined? • even simple forms of reasoning can help
(Source: Oxford American Dictionary)
• Semantic Web was designed to • accommodate different points of view • be flexible about what it can express (not preferential
towards any particular domain or application)
• Combining information in new ways • we cannot anticipate all the possible ways in which
information is used, combined ⇒ there is value to merely making information (data)
available • using Semantic Web technologies lowers the threshold
for “serendipitous reuse”
Clinical Care Insurance, Financial Aspects
Genetic Tests… Profiles
Follow up, Lifestyle
Clinical Trials Social Media
Patients, Public
Hospitals Doctors
Payors
CDC
CROs
Pharmaceutical Companies
FDA NIH (Research)
Universities, AMCs
From FDA, CDC
Translation 1: Genomic Research and Clinical Practice Translation 2: Clinical Research and Clinical Practice
Slide by: Vipul Kashyap
• For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these! • Possibly largest growth in data will be in sensors (eg
Body Area Networks, Biosensors) and social content. Extensive use of mobile phones.
Credit: ece.virginia.edu
• Semantic Web is an “interoperability technology” • Linked Data is a step in the right direction • Many examples of viable usage of Semantic Web technologies • Words of warning about deployment • For health, Semantic Web provides the needed interoperability, and can accommodate all necessary “points of view” • Significant research challenges remain as Health presents the most complex domain
• Researchers: Satya Sahoo, Dr. Priti Parikh, Pablo Mendes, Cartic Ramakrishnan, and Kno.e.sis team • Collaborators: Athens Heart Center (Dr. Agrawal), NLM (Olivier Bodenreider), CCRC-UGA (Will York), UGA (Tarleton), Bioinformatics-WSU (Raymer) • Funding: NIH/NCRR, NIH/NLBHI (R01), NSF
http://knoesis.org
1. A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
2. Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype Information WWW2007 HCLS Workshop, May 2007.
3. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4
4. Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008.
5. Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596
6. Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
7. Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and Krishnaprasad Thirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010.
• Papers: http://knoesis.org/library • Demos at: http://knoesis.wright.edu/library/demos/