IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web...
-
Upload
godfrey-hopkins -
Category
Documents
-
view
223 -
download
1
Transcript of IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web...
IBM Watson Research
© 2004 IBM Corporation
BioHaystack: Gateway to theBiological Semantic Web
Dennis [email protected]
IBM Watson Research
© 2004 IBM Corporation
Problems in bioinformatics
Myriad of public databases have specific facets of information about biological objects of interest (e.g., proteins, genes, etc.)
Databases have their own access protocols, data formats, naming conventions, and means of describing relationships between objects in different databases
Different software required to view information from different databases
– User must be keenly aware of which tool or site to use
– Relevant information comes in fragments
– Exploration process is discontinuous
IBM Watson Research
© 2004 IBM Corporation
A common naming convention: LSID URNs
Life Sciences Identifiers (LSIDs) are URNs for biological objects that are backed by RDF metadata:
– E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240
LSID and LSID protocol (SOAP-based) specification sponsored by I3C and undergoing standardization by OMG
Most of the publicly available bioinformatics databases available via LSID today
– PDB LSID authority online; “proxy” LSID authorities for databases such as NIH databases, SwissProt hosted by I3C
Really easy to set up LSID clients and servers
– IBM Internet Technology group provides Open Source LSID client and server software for a variety of languages and platforms
IBM Watson Research
© 2004 IBM Corporation
RDF/XML: on demand data integration
humanhemoglobin
LSID
oxygentransportprotein
atagccgtacctgcgagtctagaagct
derives from
is a
humanhemoglobin
LSID
humanhemoglobin
LSID
has 3D structure
GenBank
Gene Ontology
PDB
humanhemoglobin
LSID
atagccgtacctgcgagtctagaagctderives from
oxygentransportprotein
is a
has 3D structure
Unified view
+
+
IBM Watson Research
© 2004 IBM Corporation
Haystack: letting users interact with their data
Haystack is a tool for creating, exploring, and organizing information:
– Personal information: e-mails, contacts, documents, etc.
– Bioinformatics: proteins, publications, genes, etc.
Research project originating from MIT CSAIL
Uses RDF as an underlying data model
Built on Java and Eclipse, IBM’s Open Source rich client platform
http://haystack.lcs.mit.edu/
IBM Watson Research
© 2004 IBM Corporation
Browsing highly interconnected information
Single screen presents multiple facets of a single object originating from separate databases
Users navigate space like a Web browser: hyperlinking, drag and drop, etc.
IBM Watson Research
© 2004 IBM Corporation
Personalization
People keep track of their information by personalizing their workspaces:
– Grouping paperwork into folders
– Highlighting important text in documents
– Attaching sticky notes as reminders
– Jotting down lists of related items
Haystack has pervasive support for annotation and allows users to group related objects together arbitrarily for their own purposes
IBM Watson Research
© 2004 IBM Corporation
BioHaystack
BioHaystack: application of Haystack technologies to bioinformatics problem
– Integrated environment for working with biological data
– Intended for end users, i.e., non-programmers
– Builds on LSID, RDF, and Haystack
Integration offers the promise of lowering barriers to access to different backend systems (e.g., LSID servers, Grids, Web Services, relational databases, annotation servers)
Just as the Web browser acts as a client for Web content, BioHaystack can act as a client for biological Semantic content and services
IBM Watson Research
© 2004 IBM Corporation
Real world collaboration: myGrid
UK-funded joint project with the University of Manchester and other UK research institutions
RDF-based platform for supporting e-Science experiments
Real use cases; developed in collaboration with bioinformaticians
myGrid creates LSIDs and RDF metadata in the process of enacting experiments for scientists
Using BioHaystack as a browser for metadata
IBM Watson Research
© 2004 IBM Corporation
Registry
mIR
Discovery View
HaystackProvenance
Browser
FreeFluoEnactor
TavernaWF Builder
PedroAnnotation tool
Ontology Store
Others
WSDL Soap-lab
Interface Description
Annotation/description
Annotation providers
Query &Retrieve Workflow
Execution
Store data/knowledge
Scientists
Bioinformaticians
invoking
Query & register
ServiceProviders
Data descriptions
Vocabulary
myGrid Architecture
Courtesy of Professor Carole Goble, University of Manchester
IBM Watson Research
© 2004 IBM Corporation
BioHaystack + myGrid
Courtesy of Professor Carole Goble, University of Manchester
IBM Watson Research
© 2004 IBM Corporation
Thank you for your attention
Dennis Quan, [email protected] (IBM Watson Research)
Haystack project home page (download available May 24)
– http://haystack.lcs.mit.edu/
IBM LSID home page
– http://www.ibm.com/developerworks/oss/lsid/
myGrid home page
– http://www.mygrid.org.uk/
See also our session on constructing Haystack applications:
– Developer’s Day, Saturday, 4:30pm