A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid...
Transcript of A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid...
![Page 1: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/1.jpg)
David Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of
CHEMINFORMATICS
A Web Service Infrastructure for Chem[o]informatics
presented at the 4th Joint Sheffield Conference on Chemoinformatics, June 2007
David J. [email protected]
Assistant ProfessorIndiana University School of Informatics, Bloomington
http://djwild.info
![Page 2: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/2.jpg)
David Wild, Sheffield Conference, June 2007. Page 2 Indiana University School of
CHEMINFORMATICS
Overview
• Chem[o]informatics at Indiana University• The web service infrastructure• Examples of use
– Mashups and web interfaces– Workflows– Complex querying of journal articles
– Greasemonkey scripts
• Looking further in the future
![Page 3: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/3.jpg)
David Wild, Sheffield Conference, June 2007. Page 3 Indiana University School of
CHEMINFORMATICS
Chem[o]informatics at Indiana University
• Derived from long standing courses in Chemical Information Handling startedby Gary Wiggins
• Moved to School of Informatics in 2000• Boosted over the last few years through success of the School of Informatics
and by NIH Cheminformatics funding (www.chembiogrid.org)• M.S., Ph.D. and graduate certificate in cheminformatics• Graduate Certificate through Distance Education• Research partnership with Community grids lab• More information:
– http://cheminfo.informatics.indiana.edu– http://www.chembiogrid.org
– Education• JCIM 2006; 46(2) pp 495 - 502• Drug Discovery Today 11, 9&10 (May 2006), pp436-439
![Page 4: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/4.jpg)
David Wild, Sheffield Conference, June 2007. Page 4 Indiana University School of
CHEMINFORMATICS
Developments in the Web World
• Semantic Web / Web 2.0 – “Next Big Thing”– Live computation on the web
• Web services, API’s - e.g. Google Maps (http://www.google.com/apis/)• Mash-ups and workflows that use these services - e.g.
http://www.programmableweb.com, http://pipes.yahoo.com
– Social computing• Social networks - Facebook, Myspace, Linkedin• Information sharing - wikis, blogs, folksonomies, etc
– Description of meaning as well as content of information• Ontology languages, automated reasoning• Semantic interoperability of services and information
• Well funded– eScience (UK): £200m over 2001-2006 period (http://www.rcuk.ac.uk/escience/– http://www.mygrid.org.uk/ )– cyberinfrastructure / grid (US): NIH Molecular Libraries Initiative,
http://nihroadmap.nih.gov/molecularlibraries/, NSF cyberinfrastructure
![Page 5: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/5.jpg)
David Wild, Sheffield Conference, June 2007. Page 5 Indiana University School of
CHEMINFORMATICS
Web Services
![Page 6: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/6.jpg)
David Wild, Sheffield Conference, June 2007. Page 6 Indiana University School of
CHEMINFORMATICS
Chemical Informatics web service infrastructure
• Database Services– Local NIH DTP Human Tumor
Cell Line set
– Local PubChem mirror– Derived properties database– Pub3D, PubDock
– Synonym service– VARUNA quantum chemistry
database• Statistics (based on R)
– Regression, Neural Nets, RandomForest
– LDA– K-means clustering– Plotting– T-test and distribution sampling
• Computation Services– OpenEye FRED, OMEGA,
FILTER, …
– Cambridge OSCAR3– BCI fingerprint generation,
Ward’s, Divisive K-meansclustering
– Tox Tree– Similarity & fingerprint
calculations (CDK)– Descriptor calculation (CDK)
– 2D structure diagrams (CDK)– 2D->3D File format conversions
www.chembiogrid.org
![Page 7: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/7.jpg)
David Wild, Sheffield Conference, June 2007. Page 7 Indiana University School of
CHEMINFORMATICS
The world of Web 2.0 and mash-ups
• Web 2.0 and resulting mashups, etc., are further blurring the boundaries andpopularizing the "best bits" of complex subdisciplines: e.g. GIS -> Google Maps -> lots of mashups (770! - seehttp://www.programmableweb.com/api/GoogleMaps/mashups)
• We can imagine the same happening soon for chemoinformatics (e.g.structure, substructure searching) and bioinformatics (homology modeling etc).
• For more information see http://www.programmableweb.com/ andhttp://web2.wsj2.com/
• So the first stage is building applications (“mash-ups”) that use web servicesfrom one or more disciplines
• Then we start doing some really interesting stuff!
![Page 8: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/8.jpg)
David Wild, Sheffield Conference, June 2007. Page 8 Indiana University School of
CHEMINFORMATICS
Mashups - Google Maps + Estate Agent db (www.housingmaps.com)
![Page 9: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/9.jpg)
David Wild, Sheffield Conference, June 2007. Page 9 Indiana University School of
CHEMINFORMATICS
PubChemSR - .NET app for searching PubChem
Available from http://darwin.informatics.indiana.edu/juhur/Tools/PubChemSR/
![Page 10: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/10.jpg)
David Wild, Sheffield Conference, June 2007. Page 10 Indiana University School of
CHEMINFORMATICS
PubDock - database of docked PubChem Ligands
• 1 million PubChem compounds (drugable) docked into PDB proteins (currently7 but more coming)
• Two interfaces - web and standalone• This is really a bioinformatics / chemoinformatics mashup• Retrieve top hits for a protein• Organize proteins by similarity between docking profiles over compounds• Cluster compounds by docking profile across cluster targets• Uses many web services: PDB services, our PubDock database service, our
CDK services etc…
![Page 11: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/11.jpg)
David Wild, Sheffield Conference, June 2007. Page 11 Indiana University School of
CHEMINFORMATICS
![Page 12: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/12.jpg)
David Wild, Sheffield Conference, June 2007. Page 12 Indiana University School of
CHEMINFORMATICS
PubDock - Chimera-based interface
![Page 13: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/13.jpg)
David Wild, Sheffield Conference, June 2007. Page 13 Indiana University School of
CHEMINFORMATICS
Prediction of activity against 40 tumor cell lines
http://rguha.ath.cx/~rguha/ncidtp/dtp
![Page 14: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/14.jpg)
David Wild, Sheffield Conference, June 2007. Page 14 Indiana University School of
CHEMINFORMATICS
Results for Gleevec
![Page 15: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/15.jpg)
David Wild, Sheffield Conference, June 2007. Page 15 Indiana University School of
CHEMINFORMATICS
Quick and easy NLP - Kemo, a chatbot for PubChem
http://cheminfo.informatics.indiana.edu:8080/kemo
![Page 16: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/16.jpg)
David Wild, Sheffield Conference, June 2007. Page 16 Indiana University School of
CHEMINFORMATICS
Workflows - Taverna (taverna.sourceforge.net)
![Page 17: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/17.jpg)
David Wild, Sheffield Conference, June 2007. Page 17 Indiana University School of
CHEMINFORMATICS
Workflow in Xbaya - a meteorology tool!
http://www.extreme.indiana.edu/xgws/xbaya/
![Page 18: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/18.jpg)
David Wild, Sheffield Conference, June 2007. Page 18 Indiana University School of
CHEMINFORMATICS
Workflow in .NET
![Page 19: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/19.jpg)
David Wild, Sheffield Conference, June 2007. Page 19 Indiana University School of
CHEMINFORMATICS
Automatic name detection and structure generation (batch)• OSCAR3 - Murray Rust Group
– A tool for shallow, chemistry-specific natural language parsing of chemicaldocuments (e.g. journal articles).
– It identifies (or attempts to identify):• Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms.• Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.• Other entities: Things like N(5)-C(3) and so on.
– Part of the larger SciBorg effort • See http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html)
– http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3• Lexichem - OpenEye
– Toolkit for conversion of chemical structure names (IUPAC, traditional) toconnection tables, SMILES, InChI, etc.
– Used by Reel Two (www.reeltwo.com) in their SureChem package for searchingpatents based on chemical structures
– http://www.eyesopen.com/products/toolkits/lexichem.html• ACD/Name to Structure
– Batch conversion of chemical structure names to and from InChIs and SMILES– http://www.acdlabs.com/products/name_lab/rename/batch.html
![Page 20: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/20.jpg)
David Wild, Sheffield Conference, June 2007. Page 20 Indiana University School of
CHEMINFORMATICS
Harvesting example - a database of abstracts indexed by SMILES
• As proof of concept, ran OSCAR3 on 1 year’s worth of PubMed abstracts(2005-2006) to extract chemical structure names, convert them to SMILES, andindex the abstracts by SMILES
• Stored in a PostgreSQL database with gNova CHORD for structure andsimilarity searching
• Potential for use as a way of detecting new trends in publication as well as forpublication alerts based on substructure or similarity
• 208,141 unique abstracts• 10,468 chemical structure names identified by OSCAR3• 6,560 unique SMILES (6448 unique InChIs)• 3,185 of these have PubChem entries• Of 10,000 compounds randomly selected from PubChem, 2,500 compounds
had names (synonyms) found in the text of the PubMed abstracts• Ratio of mean number of names in abstracts to papers - 4.172 : 36.67• In comparison to a random 10,000 compound subset of PubChem, 84% passed
the Lipinski Rule of 5 vs 73% for PubChem. Passing the OpenEye Filter wascloser (13% vs 15%).
![Page 21: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/21.jpg)
David Wild, Sheffield Conference, June 2007. Page 21 Indiana University School of
CHEMINFORMATICS
Workflow / mash-up of PubMed abstracts and docking
Create a database containing the
text of all recent PubMed abstracts
(2006-2007 = ~500,000)
Convert molecules to 3D
and dock into a protein
of interest
Visualize top docked
molecules in a Google-
like interface
Use OSCAR to extract all of the
chemical names referred to in
the abstracts and covert to
SMILES
![Page 22: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/22.jpg)
David Wild, Sheffield Conference, June 2007. Page 22 Indiana University School of
CHEMINFORMATICS
Marking up chemical structures in web pages using Greasemonkey
http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html
![Page 23: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/23.jpg)
David Wild, Sheffield Conference, June 2007. Page 23 Indiana University School of
CHEMINFORMATICS
Live PDB links and greasemonkey paper -> blog entry link
http://www.redbrick.dcu.ie/~noel/PDB/findPDB.html
![Page 24: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/24.jpg)
David Wild, Sheffield Conference, June 2007. Page 24 Indiana University School of
CHEMINFORMATICS
Greasemonkey / OSCAR script
http://cheminfo.informatics.indiana.edu:8080/ChemGM/index.jsp
![Page 25: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/25.jpg)
David Wild, Sheffield Conference, June 2007. Page 25 Indiana University School of
CHEMINFORMATICS
PubChem - view 3D structure greasemonkey script
http://rna.informatics.indiana.edu/hgopalak/download_Jscript.html
![Page 26: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/26.jpg)
David Wild, Sheffield Conference, June 2007. Page 26 Indiana University School of
CHEMINFORMATICS
Smart mining of drug discovery information
• For scientists, it’s usually more appropriate to think in terms of informationthan tools
• Many information questions of interest to scientists are conceptually simple butcomplex to implement, not necessarily mapping onto individualchemoinformatics algorithms
• Whilst some information needs are recurring and constant,many are unique or rapidly changing
• The gap between what theoretically could be done computationally, and what isdone, is currently rather large,for a variety of reasons
![Page 27: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/27.jpg)
David Wild, Sheffield Conference, June 2007. Page 27 Indiana University School of
CHEMINFORMATICS
Simple questions can be complex to answer…
Oracle Database (HTS)
Compounds were tested
against related assays and
showed activity, including
selectivity within target families
Oracle Database (Genomics)
? None of these compounds
have been tested in a microarray
assay
Computation
The information in the
structures and known activity
data is good enough to create a
QSAR model with a confidence
of 75%
External Database (Patent)
Some structures with a
similarity > 0.75 to these appear
to be covered by a patent held by
a competitor
Computation
All the compounds pass the
Lipinksi Rule of Five and toxicity
filters
Excel Spreadsheet (Toxicity)
One of the compounds was
previously tested for toxicology
and was found to have no liver
toxicity
Word Document (Chemistry)
Several of the compounds had
been followed up in a previous
project, and solubility problems
prevented further development
Journal Article
A recent journal article
reported the effectiveness of
some compounds in a related
series against a target in the
same family
Word Document (Marketing)
A report by a team in
Marketing casts doubt on
whether the market for this target
is big enough to make
development cost-effective
SCIENTIST
“These compounds look promising from their
HTS results. Should I commit some chemistry
resources to following them up?”
?
![Page 28: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/28.jpg)
David Wild, Sheffield Conference, June 2007. Page 28 Indiana University School of
CHEMINFORMATICS
Supercharged Life Science Google (mock up!)
what compounds might bind to the enclosed protein?
![Page 29: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/29.jpg)
David Wild, Sheffield Conference, June 2007. Page 29 Indiana University School of
CHEMINFORMATICS
By the way… annotation (mock-up!)By the way…
This compounds is very similar to aprescription drug, Tamoxifen.
This compound is referenced in 20 journalarticles published in the last 5 years
Similar compounds are associated with thewords “toxic” and “death” in 280 web pages
It appears to be covered under 3 patents
It has been shown to be active in 5 screens
Computer models predict it to show someactivity against 8 protein targets
Here are some comments on this compound:
David Wild: don’t take any notice of thecomputational models - they are rubbish
![Page 30: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/30.jpg)
David Wild, Sheffield Conference, June 2007. Page 30 Indiana University School of
CHEMINFORMATICS
Indexing the world’s chemical information AND functionality
• Expose databases and computational functionality as web services– Wrap as much computational capability as we can as web services– Have databases accessible in a standard way (PostgreSQL / gNova CHORD)– Make it easy to access (c.f. Google Maps API)– Innovate with mash-ups
• Crawl and index web pages, journal articles, etc. for structures (InChIs, SMILES), images(converted using Clide or ChemReader), names (converted using OSCAR3 or similarpackage, other information (IR spectra, reactions, etc…)– pull information into searchable databases– tag or annotate in situ– federate with existing databases (PubChem etc)
• Now we know what information we have, and what we can do with it, develop cleverfront ends to do useful things– workflows and mashups– “by the way” annotations– natural language interfaces– ontologies and reasoning tools (unless something else works better!)
![Page 31: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/31.jpg)
David Wild, Sheffield Conference, June 2007. Page 31 Indiana University School of
CHEMINFORMATICS
OWL-S Web Ontology Language
• Profile– Describe what a service
does (semantically)
• Process model– Detailed description of a
service's operation
• Grounding– How to get access to the
service
![Page 32: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/32.jpg)
David Wild, Sheffield Conference, June 2007. Page 32 Indiana University School of
CHEMINFORMATICS
Generation of OWL-S in Protégé
http://owlseditor.semwebcentral.org
![Page 33: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/33.jpg)
David Wild, Sheffield Conference, June 2007. Page 33 Indiana University School of
CHEMINFORMATICS
Matching query input with OWL-S
• Still working on this!• Can easily NLP straightforward queries. Harder part is mapping onto
ontologies• Suppose the user query is submitted as the following
– “Find the drug-like compounds similar to compound X from PubChem that have aTanimoto coefficient value higher than 0.7”
• Description Logic construct is asserted as following:
SimilaritySearch DatabaseSearchServiceIHasInput.2DstructureIHasInput.TanimotoCoefficient
IHasInput.2IHasOutput.2DstructureSetIHasOutput.1
Filter OpeneyeSoftwareIHasInput.2DstructureSetIHasInput.1
IHasOutput.DruglikeCompoundsIHasOutput.1
![Page 34: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/34.jpg)
David Wild, Sheffield Conference, June 2007. Page 34 Indiana University School of
CHEMINFORMATICS
Summary
• Web services allow us to expose data and computational ability in standardways so that they can be used in association with other methods (incheminformatics or elsewhere)
• It’s all part of a wider web development, which is still immature but isundoubtably the way things are going (on the web at least)
• Semantic Web / Web 2.0 are still rather different• Mashups, workflows and automated reasoning tools offer the possibility of
better mapping techniques and data to real scientific information needs in amanner which is straightforward for the scientists
• Power is increased when structural information can be mined from journalarticles and other text documents
• But, we have to worry about reliability, critical dependency, applicability, andinterpretability
![Page 35: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/35.jpg)
David Wild, Sheffield Conference, June 2007. Page 35 Indiana University School of
CHEMINFORMATICS
Orac & Slave: Blakes Seven
Orac is a highly advanced supercomputerdeveloped by the scientist Ensor. It is
extremely terse and short-tempered. Orac hasthe ability to communicate with all other
computers that carry tarriel cells and henceprovide the Liberator crew with valuable
knowledge. Through calculation of probability,Orac can predict the future. Orac's systems are
multi-dimensional; it projects a carrier beamthrough the same dimension that allows
telepaths to transfer thoughts.
Slave is the master computer on the group's secondvessel, the Scorpio, programmed with a particularly
servile personality
taken from wikipedia
![Page 36: A Web Service Infrastructure for Chem[o]informaticscisrg.shef.ac.uk/shef2007/talks/wild.pdfDavid Wild, Sheffield Conference, June 2007. Page 1 Indiana University School of CHEMINFORMATICS](https://reader034.fdocuments.in/reader034/viewer/2022050517/5fa0f132edf5bc4e6a153bf0/html5/thumbnails/36.jpg)
David Wild, Sheffield Conference, June 2007. Page 36 Indiana University School of
CHEMINFORMATICS
Acknowledgements
• My research group: Rajarshi Guha, Xiao Dong, David Jiao, Junguk Hur, HariniGopalakrishnan, Huijun Wang
• Marlon Pierce, Randy Heiland, Jake Kim• Gary Wiggins & Geoffrey Fox• Peter Murray Rust, Peter Corbett (Cambridge)• Funding:
– NIH Exploratory Centers for Cheminformatics Research grant
– Microsoft Smart Clients for eScience grant