Evaluating semantic similarity using GML in Geographic Information Systems
HyQue: Evaluating scientific Hypotheses using semantic web technologies
-
Upload
michel-dumontier -
Category
Health & Medicine
-
view
1.719 -
download
0
Transcript of HyQue: Evaluating scientific Hypotheses using semantic web technologies
HYQUE: EVALUATING SCIENTIFIC HYPOTHESES USING SEMANTIC WEB
TECHNOLOGIES
MICHEL DUMONTIER, PHD
ASSOCIATE PROFESSOR OF BIOINFORMATICS, DEPARTMENT OF BIOLOGY, INSTITUTE OF BIOCHEMISTRY AND SCHOOL OF COMPUTER SCIENCE @ CARLETON UNIVERSITY
PROFESSEUR ASSOCIÉ, DÉPARTEMENT D’INFORMATIQUE ET DE GÉNIELOGICIEL, UNIVERSITÉ LAVAL
HYQUE IS A COLLABORATIVE WORK
Work performed by Alison Callahan, a PhD student under my supervision @ Carleton University
Partnership with Dr. Nigam Shah, Assistant Professor at Stanford University
Source: http://kentsimmons.uwinnipeg.ca/cm1504/introscience.htm
WITH UNPARALLELED GROWTH IN RESEARCH OUTPUTS, UNCOVERING ALL THE EVIDENCE TO SUPPORT/REFUTE A HYPOTHESIS IS BECOMING INCREASINGLY DIFFICULT
Citations added to Medline 1995-2009
Source:http://www.nlm.nih.gov/bsd/stats/cit_added.html
HYBROW
Computationally augmented method for hypothesis evaluation
• developed by Racunas et al. [1]• minimum event-based vocabulary• uses consistency checking to evaluate hypotheses
• constraints to ensure valid claims• rules to evaluate evidence
• compares hypotheses using neighborhood functions• incremental hypothesis improvement
[1] Racunas S. A., Shah N. H., Albert I. and Fedoroff N. V. (2004). HyBrow: A prototype system for computer-aided hypothesis evaluation. Bioinformatics 20(S. 1): i1-i8.
THE GAL GENE NETWORK IN YEAST
• Genes that encode proteins that transport and metabolize galactose
• permease – gal2p – transports galactose into cells
• galactokinase – gal1p• uridylyltransferase – gal7p• epimerase – gal10p• phosphoglucomutase –gal5p
• Regulation – whether the pathway is on or off
• gal3p• gal4p• gal80p
Source: Ostergaard et al. (2000). Nature Biotechnology 18: 1283 - 1286
HYPOTHESISh1:
e1 (Gal4p induces expression of GAL1)
h2:
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
h3:
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
simple event-based expression
conjunctive hypothesis – must satisfy two expressions
conjunctive hypothesis with conditional expression
HYBROW• small, manually generated knowledge base
• hard coded Perl rules
• challenging to apply to a new domain
• needs access to a greater KB
SEMANTIC WEB TECHNOLOGIES FOR KNOWLEDGE MANAGEMENT?Semantic Web technologies are promising for application to automating hypothesis evaluation
• Languages for formal knowledge representation• Automated reasoning• Querying over distributed resources• Growing number of biological resources available in SW formats
• Ontologies• Data
Bio2RDF is one the largest resources of linked life data on the Web
~40 data sets available• Globally distributed• Dataset-specific SPARQL endpoints
BIO2RDF IS PART OF A GROWING WEB OF LINKED DATA
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
It is about standards for publishing, sharing and querying knowledge drawn from diverse sources
It enables the answering of sophisticated questions
The Semantic Web is a web of knowledge
ontology as a strategy to
formally represent knowledge
The Web Ontology Language (OWL) Has Explicit Semantics
Can therefore be used to capture knowledge in a machine understandable way
HYBROW HYQUE
• Hypothesis query and evaluation system
• Built on Semantic Web technologies
• Background knowledge encoded as OWL ontologies
• Queries against SPARQL endpoints• Context-specific rules that consider experimental
conditions• consumes and produces RDF• Can be accessed via web or semantic web services
HYQUE IS COMPOSED OF …
• HyQue hypothesis ontology
• Describes generic input hypothesis and output hypothesis evaluation classes
• Uses upper level classes e.g. ‘proposition’, ‘measurement value’, ‘event’
• HyQue Data
• Experimentally determined interactions between the GAL proteins (GAL knowledge base from HyBrow project)
• Literature-based evidence (citations)• Knowledge about cellular localization and biological processes (GO)• Types of evidence supporting these interactions (ECO)• yeast gene/protein/function data (SGD)
A HYQUE HYPOTHESIS IS A COLLECTION OF PROPOSITIONS
• HyQue hypotheses are composed of one or more propositions connected using logical operators (AND, OR, XOR…)
• proposition: “a statement expressing something true or false”
• HyQue propositions only specify events
HyQue hypothesis ≡ ‘proposition’
that ‘specifies’ only `event’)
HyQue hypothesis ≡ ‘proposition’
that `has component part’ only
(`proposition’ that ‘specifies’ only `event’)
HYQUE EVENTS
1. protein-protein binding
2. protein-nucleic acid binding
3. molecular activation
4. molecular inhibition
5. gene induction
6. gene repression
7. transport
HYQUE EVENTS
Events are composed of conditional assertions on a relation between ‘actor’ and ‘target’
induces(agent, target, context, location)
For decidable logic (OWL), an n-ary object is used
Event ‘has agent’ agent ‘has target’ target ‘has context’ context ‘is located in’ location
ALL DATA ARE REPRESENTED USING RDF
event:gal4p positively regulates the expression of GAL1
hypothesis
proposition
has component part
specifies
RDF’s basic representation unit is the “triple”
<subject> <predicate> <object>
:h rdf:type hyque:Hypothesis .
:h hyque:has-component-part :p1 .
:p1 rdf:type hyque:Proposition .
ALL DATA ARE REPRESENTED USING RDF
event:gal4p positively regulates the expression of GAL1
hypothesis
specifies
:h a hyque:Hypothesis ;
hyque:specifies :e1 .
:e1 a <http://bio2rdf.org/go:0010628>
<!– positive regulation of gene expression -->
hyque:is_negated "0";
hyque:agent <http://bio2rdf.org/sgd:Gal4p> ;
hyque:target <http://bio2rdf.org/sgd:GAL1> ;
….
USER INTERFACE FACILITATES DESIGNING THE HYPOTHESIS
TEMPLATE SPARQL QUERIES COMPLETED BASED ON EVENT PROPERTIES
:e1 a go:0010628;hyque:is_negated "0" ;hyque:agent sgd:Gal4p;hyque:target sgd:GAL1 .
construct { … }
where { ?event hyque:is_negated ?negated . ?event hyque:logical_operator ?logical_operator . ?event hyque:agent <http://bio2rdf.org/sgd:Gal4p> . ?event hyque:target<http://bio2rdf.org/sgd:GAL1> . …}
binding
Hypothesis + SPARQL Template => SPARQL query
SPARQL QUERY RESULTS RETRIEVED
hybrow_data:f0957524deecae38945736737cc07d45 hyque:logical_operator <http://bio2rdf.org/go:0010628> ; hyque:is_negated "0" ; hyque:agent <http://bio2rdf.org/sgd:Gal4p> ; hyque:target <http://bio2rdf.org/sgd:GAL1>; hyque:agent_type <http://bio2rdf.org/chebi:36080> ; hyque:target_type <http://bio2rdf.org/so:0000236> ; hyque:location <http://bio2rdf.org/go:0005634> ; hyque:agent_function_type <http://bio2rdf.org/go:0003700> .
Protein
Gene
Nucleus
Transcription factor activity
positive regulation of
gene expression
QUERY RESULTS EVALUATED BASED ON RULE SETS‘induce’ rule (maximum score: 5):
• Is event negated?• If yes, subtract 2
• Is logical operator ‘induce’?• If yes, add 1; if no, subtract 1
• Is agent of type ‘protein’ or ‘RNA’?• If yes, add 1; if of type ‘gene’, subtract 1
• Is target of type ‘gene’? • If yes, add 1; if no, subtract 1
• Does agent have known ‘transcription factor activity’? • If yes, add 1
• Is event located in the ‘nucleus’?• If yes, add 1; if no, subtract 1
GO:0010628
CHEBI:36080
SO:0000236
GO:0003700
GO:0005634
EVALUATING HYPOTHESESe1 (Gal4p induces expression of GAL1)
e1 describes the induction of GAL1 gene expression by Gal4p and is therefore an event of type ‘induce’.
Evaluation:
•Agent of type ‘protein’: yes -> +1
•Target of type ‘gene’: yes -> +1
•Agent has function ‘transcription factor activity’: no -> 0
•Event location is ‘nucleus’: yes -> +1
•Logical operator is ‘induce’: yes -> +1
•Event negated in published literature: no -> 0
Thus, the e1 event obtains 4 out of a maximum of 5 points, and receives a score of 0.8.
EVALUATING HYPOTHESES
Events e2, e3, and e4 are also ‘induce’ events and are evaluated using the ‘induce’ rule set, each obtaining a score of 0.8.
e5 is undecidable - no data to support that Gal80p inhibits Gal4p when GAL3 is over-expressed in HKB
-> third entire event set is deemed undecidable.
Overall hypothesis score selected from e1 (0.8), e2 + e3 (0.8+0.8=1.6)
Final hypothesis score is 1.6 + events e2 + e3 have the strongest experimental support.
e1 (Gal4p induces expression of GAL1)
OR
e2 (Gal3p induces expression of GAL2
e3 AND Gal4p induces expression of GAL7)
OR
e4 (Gal4p induces expression of GAL7
e5 AND Gal80p inhibits production of Gal4p
when GAL3 is over-expressed
e6 AND Gal80p induces expression of GAL7)
HYPOTHESIS EVALUATION REPRESENTED AS RDF
BROWSE HYPOTHESIS AND EVALUATION AS LINKED DATA
http://sadiframework.org
Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB
The Semantic Automated Discovery and Integration (SADI) framework makes it easy to create Semantic Web services using OWL classes as service inputs and outputs
Users can post a hypothesis in RDF and receive the hypothesis evaluation RDF
HyQue can become part of a workflow for investigations
FUTURE DIRECTIONS• Investigate alternative, finer grained scoring systems
• Expand beyond the GAL network with network reconstructions and NLP facilitated data curation
• Collaborative social environment to engineer, share, compare and evaluate hypotheses, and format the results
CONCLUSION
HyQue is a new system to construct and evaluate (automatically obtain support for) hypotheses using formalized background knowledge and data on the Semantic Web
AcknowledgementsAlison Callahan (developing HyQue)
Nigam Shah (key collaborator)
Stephen Racunas and Amar Das for helpful discussions
Bio2RDF: Peter Ansell, Francois Belleau, Allison Callahan, Jacques Corbeil, Jose Cruz-Toledo, Alex De Leon, Steve Etlinger, James Hogan, Nichealla Keith, Jean Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and, Paul Roe
SADI: Christopher Baker, Melanie Courtot, Jose Cruz-Toledo, Steve Etlinger, Nichealla Keith, Artjom Klein, Luke McCarthy, Silvane Paixao, Ben Vandervalk, Natalia Villanueva-Rosales, Mark Wilkinson