Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006.
-
Upload
katherine-schmitt -
Category
Documents
-
view
222 -
download
2
Transcript of Translational Medicine from a Semantic Web Perspective Eric Neumann W3C June 16, 2006.
Translational Medicine from a Semantic Web Perspective
Eric NeumannW3C June 16, 2006
2
Drug Discoveryand Medicine
Hygieia, G. Klimt
• Health
• Practice
• Safety
• Prevention
• Privacy
• Knowledge
3
Data ExpansionLarge Data SetsVariables >> Samples
Many New Data TypesWhich Formats?
Combine
4
Where Information Advances are Most Needed
• Supporting Innovative Applications in R&D– Translational Medicine (Biomarkers)– Molecular Mechanisms (Systems)– Data Provenance, Rich Annotation
• Clinical Information– eHealth Records, EDC, Clinical Submission Documents– Safety Information, Pharmacovigilance, Adverse Events,
Biomarker data
• Standards– Central Data Sources
• Genomics, Diseases, Chemistry, Toxicology
– MetaData• Ontologies
• Vocabularies
5
Knowledge
“--is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations.”
How does this translate into using Information Systems better in support of Innovation?
6
Knowledge Predictiveness
• Knowledge of Target Mechanisms• Knowledge of Toxicity • Knowledge of Patient-Drug Profiles
Drug Discovery Challenges
7
Current Challenges: Drug Discovery
• Business– Costly, lengthy drug discovery process (12-14 years)– Poor funding to find new uses for existing therapies (ie antibiotics)– Insufficient economic drivers for certain disease areas– Discovery and clinical trials design not well aligned with anticipating adverse
effect detection• Post-launch surveillance is weak
• Science & Technology– Counteracting the legacy of “Silos”– How to break away from the DD “conveyor belt model” to the “Translation
model” • gaining and sharing insights throughout the process
– The Benefit of New Targets for New Diseases– How to best identify safety and efficacy issues early on, so that cost and failure
are reduced• A D3 Knowledge-base: Drugability and Safety
8
The Big Picture -
Hard to understand from just a few Points of View
9
10Complete view tells a very different Story
11
Distributed Nature of R&D
Silos of Data…
12
Static,Untagged,
Disjoint
Existing Web Data Throttles the R&D Potential
R&D ScientistIntegrating
Data Manually
LIMS Bioinformatics Cheminformatics Public Data Sources
Dolor Sit Amet ConsectetuerLacreet Dolore Euismod VolutpatLacreet Dolore Magna Volutpat
Nibh Euismod Tincidunt Aliguam Erat
Dolor Sit Amet ConsectetuerLacreet Dolore Euismod VolutpatLacreet Dolore Magna Volutpat
Nibh Euismod Tincidunt Aliguam Erat
13
Data Integration: Biology Requirements
Disease Proteins GenesPapers
RetentionPolicy
AuditTrail
Curation Tools Ontology Experiment
Assays
Compounds
14
Semantic Web Data Integration
R&D Scientist
Bioinformatics CheminformaticsLIMS Public Data Sources
Dynamic,Linked,
Searchable
15
DecisionDecisionSupportSupport
TranslationalTranslationalResearchResearch
ToxicityToxicity
NewNewApplicationsApplications
SafetySafety
TargetTargetValidationValidation
BiomarkerBiomarkerQualificationQualification
GOGO
BioPAXBioPAX
ICHICH
Raw DataRaw Data
MAGE MLMAGE ML
ASN1.ASN1.
XLSXLS
Psi XMLPsi XML
CSVCSV
SAS TablesSAS Tables
CDISCCDISC
Semantic BridgeSemantic Bridge
16
Key Technologies Pharmaceuticals use to Exchanging Knowledge
17
New Regulatory Issues Confronting Pharmaceuticals
from Innovation or Stagnation, FDA Report March 2004
Tox/EfficacyADME Optim
18
Key Functionality
• Ubiquity– Same identifiers for anything from anywhere
• Discoverability– Global search on any entity
• Interoperability– => Application independence:
“Recombinant Data”
19
Additional Functionality
• Provenance– Origin and history of data and annotations
• Scalability– Over all potentially relevant data and content
• Authentication/Security– Single user and team identity and granular data security– Non-repudiation of authorship– Encryption of graphs– Policy Awareness
• Data Preservation– Long-term persistence by minimizing API needs
20
Translational Research and Personalized Medicine
Research Practice
Clinical
Biological
Biomedical Research
ClinicalPractice
ClinicalResearch
PersonalizedMedicine
TranslationalMedicine
-Two significant areas of HCLS activity- Span most areas of activity
21
HCLS Framework:Biomedical Research
• Molecular, Cellular and Systems Biology/Physiology– Organism as an integrated an interacting network of genes, proteins and
biochemical reactions
– Human body as a system of interacting organs
• Molecular Cell Biology/Genomic and Proteomic Research– Gene Sequencing, Genotyping, Protein Structures
– Cell Signaling and other Pathways
• Biomarker Research– Discovery of genes and gene products that can be used to measure disease
progression or impacts of drug
• Pharmaco-genomics– Impact of genetic inheritance on
• Drug Discovery and Translational Research– Use of preclinical research to identify promising drug candidates
22
HCLS Framework:Clinical Research
• Clinical Trials– Determination of efficacy, impact and safety of drugs for particular
diseases
• Pharmaco-vigilance/ADE Surveillance– Monitoring of impacts of drugs on patients, especially safety and adverse
event related information
• Patient Cohort Identification and Management– Identifying patient cohorts for drug trials is a challenging task
• Translational Research– Test theories emerging from pre-clinical experimentation on disease
affected human subjects
• Development of EHRs/EMRs for both clinical research and practice– Currently EHRs/EMRs focussed on clinical workflow processes– Re-using that information for clinical research and trials is a challenging
task
24
Translational Research
• Improve communication between basic and clinical science so that more therapeutic insights may be derived from new scientific ideas - and vice versa.
• Testing of theories emerging from preclinical experimentation on disease-affected human subjects.
• Information obtained from preliminary human experimentation can be used to refine our understanding of the biological principles underpinning the heterogeneity of human disease and polymorphism(s).
• http://www.translational-medicine.com/info/about
• Reference NIH Digital Roadmap activity
27
Personalized Medicine
• Propagation of insights from Genomic research into clinical practice
• Impact of new Molecular diagnostic tests hitting the market– How can they be incorporated into clinical care?– How does one update current clinical guidelines to incorporate the use of these
tests– How can one enable novel clinical decision support?
• How can phenotypic characteristics and genomic markers be used to:– Stratify patient populations– “Personalize” clinical care
• Genetic test results as risk factors• Therapeutic use of genomic markers
29
Ecosystem: Current State
PharmaceuticalCompanies
Clinical ResearchOrganizations (CROs)
FDANational InstitutesOf Health
Hospitals
Universities,Academic MedicalCenters (AMCs)
Characterized by silos with uncoordinated supply chains leading to inefficiencies in the system
Center forDiseaseControl
Hospitals Doctors
Payors
Patients
Patients,Public
Patients
Patients
Biomedical ResearchClinical Practice
Clinical Trials/Research Clinical Practice
30
Ecosystem: Goal State
/* Need to expand this to include Healthcare and Biomedical Research Players as well… Show an integrated picture with “continuous” information flow */
/* Need to expand this with Biomedical Research + Clinical Practice */
Biomedical Research Clinial Practice
32
Use Case Flow: Drug Discovery and Development
Qualified Targets
Lead Generation
Toxicity & Safety
Biomarkers
PharmacogenomicsClinical Trials
Molecular Mechanisms
Lead Optimization
KD
33
Drug Discovery & Development Knowledge
Qualified Targets
Lead Generation Toxicity &
Safety
Biomarkers
Pharmacogenomics
Clinical Trials
Molecular Mechanisms
Lead Optimization
Launch
35
Semantic Web Drug DD Application Space
Genomics
Therapeutics
Biology
HTS
NDA
Compound Opt
safety
eADME
DMPK
informatics
manufacturing
genes
ClinicalStudies
Patent
Chem Lib
Production
Critical Path
36
Opportunities for Semantics in HealthCare
• Enhanced interoperability via:– Semantic Tagging
– Grounding of concepts in Standardized Vocabularies
– Complex Definitions
• Semantics-based Observation Capture• Inference on Diseases
– Phenotypes
– Genetics
– Mechanisms
• Semantics-based Clinical Decision Support– Guided Data Interpretation
– Guided Ordering
• Semantics-based Knowledge Management
37
Text
UnstructuredData Types
Structuredand Complex Data
Types
Histology Profiling
Data Semantics in the Life Sciences
Publications
Image + Text
Publications + data
Text + data items
genomics
Gene expression
Data Items
Data Items
Clinical Findings
CategoricalTaxonomicData Items
Pathways, Biomarkers
ComplexObjects
Clinical trials
ComplexObjects withCategorical/TaxonomicData Items
Systems Biology
CompositeObjects withEmbedded“process”
39
RDB => RDF
Virtualized RDF
42
Use-Case: COSA
Ro
w S
em
an
tic
<rd
f:ty
pe
Su
bje
ct>
Data Set
Column Semantic <rdf:type Gene>
43
Use-Case:Experimental Design Definition
TreatmentW
ControlTime
PointsStaining
VisibleMicroscopy
FluorescentMicroscopy
CulturedCells
TreatmentZ
ImageAnalysis
44
Case Study: Drug Safety ‘Safety Lenses’
• Lenses can ‘focus data in specific ways– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools• Aggregate other papers and findings (knowledge) in
context with a particular project• Align animal studies with clinical results• Support special “Alert-channels” by regulators for
each different toxicity issue• Integrate JIT information on newly published
mechanisms of actions
45
Courtesy of BG-Medicine
Example:Knowledge Aggregation
46
Case Study: Omics
ApoA1 …
… is produced by the Liver
… is expressed less in Atherosclerotic Liver
… is correlated with DKK1
… is cited regarding Tangier’s disease
… has Tx Reg elements like HNFR1
Subject Verb Object
48
Scenario: Biomarker Qualification
• Biomarker Roles– Disease– Toxicity– Efficacy
• Molecular and cytological markers– Tissue-specific– High content screening derived information– Different sets associated with different predictive tools
• Statistical discrimination based on selected samples– Predictive power– Alternative cluster prediction algorithms– Support qualifications from multiple studies (comparisons)
• Causal mechanisms– Pathways– Population variation
49
BioMarker Semantics
DiseasePathways
Significance&
Strength
+Samples -Samples
Biomarker Set
50
Scenario: Toxicity• Mechanisms
– Tissue-selective, Species-specific– Pathways, Off-Targets– Metabolites, PK sensitivity
• Evidence– Biomarkers
• In vitro assays (cell lines), Animal models, Clinical Phase 1
– Literature
• Population Variation– Drug Metabolism to toxic forms (CYP, SULT, UGT) – Target interaction variability
– Potential vs. Demonstrated
• Predictions– Data Mining Patterns– Computational Modeling
• Working Solutions– Chemical modifications– Dosing, Reformulation– Documented animal <=> human similarity and variation
51
Knowledge Mining using Semantic Web
“Gene Prioritization through Data Fusion”
- Aerts et al, 2006, Nature
-Use of quantitative and qualitative information for statistical ranking.
-Can be used to identify novel genes involved in diseases
52
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type ><bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
53
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
54
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/><bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
Modulation
CHIR99102
affectedBy
55
Potential Linked Clinical Ontologies
Clinical Trialsontology
RCRIM(HL7)
Genomics
CDISC
IRB
Applications
Molecules
Clinical Obs
ICD10
Pathways(BioPAX)
DiseaseModels
Extant ontologies
Mechanisms
Under development
Bridge concept
SNOMED
DiseaseDescriptions
Tox
56
Case Study: Drug Discovery Dashboards
• Dashboards and Project Reports• Next generation browsers for semantic
information via Semantic Lenses• Renders OWL-RDF, XML, and HTML
documents• Lenses act as information aggregators
and logic style-sheets
add { ls:TheraTopic hs:classView:TopicView}
57
Drug Discovery Dashboardhttp://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Target: GSK3beta
Disease: DiabetesT2
Alt Dis: Alzheimers
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
58
Bridging Chemistry and Molecular Biology
urn:lsid:uniprot.org:uniprot:P49841
Semantic Lenses: Different Views of the same data
Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot
BioPax Components
Target Model
59
•Lenses can aggregate, accentuate, or even analyze new result sets
• Behind the lens, the data can be persistently stored as RDF-OWL
• Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references
Bridging Chemistry and Molecular Biology
60
Pathway Polymorphisms
•Merge directly onto pathway graph
•Identify targets with lowest chance of genetic variance
•Predict parts of pathways with highest functional variability
•Map genetic influence to potential pathway elements
•Select mechanisms of action that are minimally impacted by polymorphisms
Non-synonymous polymorphisms from db-SNP
61
Knowledge Channels
<item rdf:about="http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01"><title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia.</title><link>http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01</link><description>Posted by hannahr to CLLSignalling&Processes on Thu Jan 19 2006</description><dc:creator>hannahr</dc:creator><dc:date>2006-01-19T11:24:03Z</dc:date><dc:subject>CLLSignalling&Processes</dc:subject><connotea:uri>
<dc:title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator> <dc:creator>H Gary-Gouy</dc:creator> <dc:identifier> <connotea:PubMedID> <connotea:idValue>16408101</connotea:idValue> <rdf:value>PMID: 16408101</rdf:value> </connotea:PubMedID> </dc:identifier> <dc:date>2006-01-12</dc:date> <prism:publicationName>Leukemia</prism:publicationName> <prism:issn>0887-6924</prism:issn>
</connotea:uri></item>
62
Knowledge Channels
<item rdf:about="http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01"><title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia.</title><link>http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01</link><description>Posted by hannahr to CLLSignalling&Processes on Thu Jan 19 2006</description><dc:creator>hannahr</dc:creator><dc:date>2006-01-19T11:24:03Z</dc:date><dc:subject>CLLSignalling&Processes</dc:subject> <kn:nugget rdf:resource=“#N251”>
<tn:expert>Giles Day </tn:expert> <tn:topic>pf#P38</tn:topic> <tn:kChannel>pf#Kinases</tn:kChannel > <tn:comment>This paper suggests a mechanism for P38 protection of CLL B-cells</tn:comment >
</kn:nugget ><connotea:uri>
<dc:title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic leukemia.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator> <dc:creator>H Gary-Gouy</dc:creator> <dc:identifier> <connotea:PubMedID> <connotea:idValue>16408101</connotea:idValue> <rdf:value>PMID: 16408101</rdf:value> </connotea:PubMedID> </dc:identifier> <dc:date>2006-01-12</dc:date> <prism:publicationName>Leukemia</prism:publicationName> <prism:issn>0887-6924</prism:issn>
</connotea:uri></item>
P38 paper
N251
Giles Day
pf#P38
Pf#Kinases
nugget
expert
topic
kChannel
63
Case Study: Drug Safety ‘Safety Lenses’
• Lenses can ‘focus data in specific ways– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools• Aggregate other papers and findings (knowledge) in
context with a particular project• Align animal studies with clinical results• Support special “Alert-channels” by regulators for
each different toxicity issue• Integrate JIT information on newly published
mechanisms of actions
65
GeneLogic GeneExpress Data
• Additional relations and aspects can be defined additionally
Diseased Tissue
Links to OMIM (RDF)
66
Bar View of GeneExpress
67
ClinDash: Clinical Trials Browser
Clinical Obs
Expression Data
Subjects
•Values can be normalized across all measurables (rows)
•Samples can be aligned to their subjects using RDF rules
•Clustering can now be done over all measureables (rows)
68
69
70
71
76
W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group
• Interest Group formally launched Nov 2005: http://www.w3.org/2001/sw/hcls
• First Domain Group for W3C - “…take SW through its paces”
• An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices
• Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum
• SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA,
• Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)
77
HCLS Objectives
• Share use cases, applications, demonstrations, experiences
• Exposing collections
• Developing vocabularies
• Building / extending (where appropriate) core vocabularies for data integration
78
HCLS Activities
• BioRDF - data + NLP as RDF• BioONT - ontology coordination • Scientific Publishing - evidence management• Adaptive Clinical Protocols and Pathways • Clinical Trials
79
BioRDF: NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information.
2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner.
3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.
80
BioRDF: Reagents
RDF resources that describes various kinds of experimental reagents, starting with antibodies:
•Initial RDF that captures: Gene, the fact that this is an antibody, various kinds of pages about the antibody, such as vendor documentation, and any other properties that are explicitly captured in the source material•Work with the Ontology task force to identify appropriate ontologies and vocabularies to use in the RDF.•Write queries against the RDF to answer questions of the sort posed on the Alzforum's
81
BioRDF: NCBI
• NCBI Data: URIs and as RDF• Terminology Integration: NLM’s UMLS, MESH
– SNOMED
• Olivier Bodensreider
82
BioRDF Neuro Tasks
• Aggregate facts and models around Parkinson’s Disease
• BIRN / Human Brain Project• SWAN: scientific annotations and
evidence• Use RDF and OWL to describe
– ’Brain Connectivity'– Neuronal data in SenseLab
89
What does RDF get you?
• Structure is not format-rigid (i.e. tree)– Semantics not implicit in Syntax– No new parsers need to be defined for new data
• Entities can be anywhere on the web (URI)• Define semantics into graph structures
(ontologies)– Use rules to test data consistency and extract important
relations
• Data can be merged into complete graphs• Multiple ontologies supported
90
RDF vs. XML example
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Wang et al., Nature Biotechnology, Sept 2005
AGML
HUPML
91
RDF Stripe Mode
Node>Edge>Node>Edge….
92
RDF Graph
94
gsk:KENPAL rdf:type :Compound ; dc:source http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14698171 ;
chemID “3820” ;
clogP “2.4” ;
kA “e-8” ;
mw “327.17” ;
ic50 { rdf:type :IC50 ; value “23” ; units :nM ; forTarget gsk:GSK3beta } ;
chemStructure “C16H11BrN2O” ;
rdfs:label “kenpaullone” ;
synonym “bromo-paullone” ;
smiles “C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B” ;
inChI “1/C16H11BrN2O/c17-9-5-6-14-11(7-9)12-8-15(20)18-13-4-2-1-3-10(13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H” ;
xref http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3820 .
95
Multiple Ontologies Used Together
Drug targetontologyFOAF
Patentontology
OMIM
Person
Group
Chemicalentity
Disease
SNP
BioPAX
UniProt
Extant ontologies
Protein
Under development
Bridge concept
UMLS
DiseasePolymorphisms
PubChem
96
Case Studies
97
Case Study: NeuroCommons.org
• Public Data & Knowledge for CNS
• R&D Forum
• Available for industry and academia
• All based on Semantic Web Standards
99
NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information.
2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner.
3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.
102
HCLS Neuro Tasks
• Aggregate facts and models around Parkinson’s Disease
• SWAN: scientific annotations and evidence• Use RDF and OWL to describe
– Brain scans in the The Whole Brain Atlas– Neural entries in NCBI’s Entrez Gene Database– ’Brain Connectivity'– Neuronal data in SenseLab– Neurological Disease entries in OMIM
104
Conclusions:Key Semantic Web Principles
• Plan for change • Free data from the application that
created it • Lower reliance on overly complex
Middleware• The value in "as needed" data integration
• Big wins come from many little ones • The power of links - network effect • Open-world, open solutions are cost
effective • Importance of "Partial Understanding"
106
What is the Semantic Web ?
• http://www.w3.org/2006/Talks/0125-hclsig-em/
It’s AI
It’s Web 2.0
It’sOntologies
It’s DataTracking
It’s a Global Conspiracy
It’s SemanticWebs
It’s TextExtraction
107
W3C Roadmap
• Semantic Web foundation specifications – RDF, RDF Schema and OWL are W3C
Recommendations as of Feb 2004
• Standardization work is underway in Query, Best Practices and Rules
• Goal of moving from a Web of Document to a Web of Data
The Only Open and Web-based Data Integration Model Game in Town
108
The Current Web
What the computer sees: “Dumb” links
No semantics - <a href> treated just like <bold>
Minimal machine-processable information
109
The Semantic Web
Machine-processable semantic information
Semantic context published – making the data more informative to both humans and machines
110
Google Graphs
Ranking Sites based on Topology
Associate Word frequencies with ranked sites
111
The Technologies: RDF
• Resource Description Framework• W3C standard for making statements of fact
or belief about data or concepts• Descriptive statements are expressed as
triples: (Subject, Verb, Object)– We call verb a “predicate” or a “property”
Subject ObjectProperty
<Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>
112
Universal, semantic connectivity supports the construction of elaborate structures.
What RDF Gets You
113
Losing Connectedness in Tables
Casp2
Colon
?
Fast Uptake and ease of use, but loose binding to entities and terms
Casp2
Endodermal
114
Data Integration?
• Querying Databases is not sufficient
• Data needs to include the Context of Local Scientists
• Concepts and Vocabulary need to be associated
• More about Sociology than Technology
Information Knowledge
115
Standards- Why Not?
• Good when there’s a majority of agreement• By vendors, for vendors?• Mainly about Data Packing-- should be more
about Semantics (user-defined)• API dominated (Time trapped)• Ease and Expressivity• Too often they’re Brittle and Slow to develop• “They’re great, that’s why there are so many of
them”
116
Data Integration Enables Business Integration: Efficiency and Innovation
• Searching
• Visualization
• Analysis
• Reporting
• Notification
• Navigation
117
Searching…
#1 way for finding information in companies…