Introduction to T he Semantic Web
description
Transcript of Introduction to T he Semantic Web
Introduction to The Semantic Web
Rick Bradshaw M.S.Sr. Data Architect
Office of the Associate VP Health Sciences IT
Overview
• Introduce the Semantic Web• Interactive study of ClinicalTrials.gov semantic
web style– Take a closer look at RDF– Run example SPARQL queries
• Introduce federation– Run example SPARQL queries against federated
data
Semantic Web Definition
• The Semantic Web facilitates applying machine-readable semantic data/metadata to resources that are distributed across the web/internet– Often associated with specific technologies
• RDF – Resource Description Framework• RDFS – RDF Schema• OWL – Web Ontology Language
• Web 3.0 (?)
http://en.wikipedia.org/wiki/Semantic_Web
Machine-readable
• A computer can read and “understand” data– Ask specific questions and get specific answers – Aggregate specific data, perform calculations,
organize/order returned data• Can Google read and “understand” web data?
Example
• Specific Question– How many Spinal Muscular Atrophy trials have been
conducted at the University of Utah and when were they conducted?
• Specific Answer = ?• Google’s Answer
– “spinal muscular atrophy trial university of utah”– 14,500 pages– Top hit is very relevant in content
• Is it “computable”?
HTML<h2>Enrolling/Ongoing: </h2><p>Clinical and Genetic Studies in Spinal Muscular Atrophy</p><p>Metabolic Dysfunction in SMA: impact of nutritional management</p><p>Prospective Study of Bone Abnormalities in SMA</p><p>STOP SMA: Phenylbutyrate trial in pre-symptomatic infants with SMA</p><p><span><span>Pilot newborn screening project for identification and prospective followup of infants with spinal muscular atrophy</span></span></p><p><span><span>Atalauren extension study in patients with Duchenne Muscular Dystrophy</span></span>…
ClinicalTrials.gov RDF/XML
• Semantic Web Data for Clinical Trials– (1) http://static.linkedct.org/– (2) http://static.linkedct.org/page/trials/NCT00661453
Triples Triples Triples
• Triple Statement – <s><p><o>– Subject (s) – the resource – Predicate (p) – the relationship
• Often called the “property” in OWL– Object (o) – object of the relationship
• Example – (s) trial:NCT00661453 – (p) linkedct:brief_title– (o) “CARNIVAL Type I: Valproic Acid and Carnitine in Infants
With SMA Type I ”
Abbreviations
• For ease of readability• trial:NCT00661453
– “trial:” - abbreviation for namespace“http://static.linkedct.org/resource/trials/”
– “linkedct:” - abbreviation for namespace“http://static.linkedct.org/resource/linkedct/”
Triple Notations
• There are many– Turtle– RDF– OWL– OBO
Triples Text
Subjecttrial:NCT00661453trial:NCT00661453trial:NCT00661453trial:NCT00661453cond:1237cond:1237
Predicaterdf:typect:brief_titlect:start_datect:conditionrdf:typect:condition_name
Objectct:trials“CARNIVAL…”“April 2008”cond:12347ct:condition“Spinal Muscular…”
Triple Graphtrial:NCT00661453
“CARNIVAL Type I: Valproic Acid and Carnitine in Infants With Spinal Muscular Atrophy (SMA) Type I ”
ct:brief_titlect:condition
ct:start_datecond:12347
“April 2008”
rdf:type
ct:trial
“Spinal Muscular Atrophy Type I ”
ct:condition_name
rdf:typect:condition
RDF XML
• (see file under #2)
<rdf:RDF…> <rdf:Description rdf:about="http://static.linkedct.org/resource/trials/NCT00481013"> <linkedct:brief_title>Valproic Acid in Ambulant Adults With Spinal Muscular Atrophy</linkedct:brief_title> …</rdf:RDF>
Observations
• RDF is a standard supporting consistent data representation
• Rules about standards apply– Use an existing standards whenever possible
Popular RDF Standards
• Friend of a friend– alias=foaf– describe people and links
• Dublin Core– alias=dc– “metadata” standard
• Simple Knowledge Organization System– alias=skos– terminology, thesauri, …
Data Federation
• Combine data from more than one data source• Heterogeneous data
– All data sources do not use the same standards• ds1.firstName • ds2.first_name • ds3.person_name
• Homogeneous data– All data sources use the same standards
• ds1.firstName • ds2.firstName • ds3.firstName
Property Alignment Assertions
• ds1:firstNameowl:equivalentProperty
foaf:firstName• ds2:first_name
owl:equivalentPropertyfoaf:firstName
Class Alignment Assertions
• ds1:Personowl:equivalentClass
foaf:Person• ds2:HumanBeing
owl:equivalentClassfoaf:Person
Rule-based Assertions
• Use rules to evaluate complicated “if-then” scenarios and assert results– SWRL – Semantic Web Rule Language– JRL - Jena Rule Language
Reasoning
• Compute assertions• Adds new triple statements to the triple graph• Implications
– Data of interest must be read from all data sources to compute assertions
– When data sources are large this can take a long time and adequate computational resources are required
Use Case
• Combine clinical trial data with patient data• SMA trial data from clinicaltrials.gov
(linkedct.org) with patient demographics for 5 different trials
Resources• W3 Schools
– http://www.w3schools.com/semweb/default.asp• W3C Web Sites
– http://www.w3.org/standards/semanticweb/– http://www.w3.org/RDF/– http://www.w3.org/standards/techs/owl#w3c_all
• Safari Books– http://proquest.safaribooksonline.com– Semantic Web Programming– Semantic Web for the Working Ontologist
Resources
• Jena Java API• Protégé• D2R
Entity Relationship Diagram
TRIALTRIAL_IDBRIEF_TITLECONDITION_IDSTART_DATE
CONDITIONCONDITION_IDCONDITION_NAME