1 E-Chemistry and Web 2.0 Marlon Pierce [email protected] Community Grids Lab Indiana...
-
date post
18-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of 1 E-Chemistry and Web 2.0 Marlon Pierce [email protected] Community Grids Lab Indiana...
2
One Talk, Two Projects
NIH funded Chemical Informatics and Cyberinfrastructure Collaboratory (CICC) @ IU. Geoffrey Fox Gary Wiggins Rajarshi Guha David Wild Mookie Baik Kevin Gilbert And others
Proposed Microsoft-Funded Project: E-Chemistry Carl Lagoze (Cornell), Lee Giles (PSU), Steve Bryant (NIH), Jeremy Frey (Soton), Peter Murray-Rust
(Cambridge), Herbert Van de Sompel (Los
Alamos), Geoffrey Fox (Indiana) And others
3
CICC Infrastructure Vision Chemical Informatics: drug discovery and other academic chemistry,
pharmacology, and bioinformatics research will be aided by powerful, modern, open, information technology. NIH PubChem and PubMed provide unprecedented open, free data and
information. We need a corresponding open service architecture (i.e. avoid stove-piped
applications) CICC set up as distributed cyberinfrastructure in eScience model
Web clients (user interfaces) to distributed databases, results of high throughput screening instruments, results of computational chemical simulations and other analyses. Composed of clients to open service APIs (mash-ups) Aggregated into portals
Web services manipulate this data and are combined into workflows. So our main agenda items: create interesting databases and build lots of
Web services and clients.
CICC DatabasesMost of our databases aim to add value to
PubChem or link into PubChem 1D (SMILES) and 2D structures
3D structures (MMFF94) Searchable by CID, SMARTS, 3D similarity
Docked ligands (FRED, Autodock) 906K drug-like compounds into 7 ligands Will eventually cover ~2000 targets
Philosophy: we have big computers, so let’s calculate everything ahead of time and put the results in a DB.
5
Building Up the Infrastructure
Our SOA philosophy: use standard Web services. Mostly stateless Some cluster, HPC work needed but these populate
databasesServices are aggregate-able into different
workflows. Taverna, Pipeline Pilot, …
You can also build lots of Web clients.See
http://www.chembiogrid.org/wiki/index.php/CICC_Web_Resources for links and details.
Not so far from Web 2.0….
6
Type Service Functionality Source License
Database Docking
Provides access to the results of docking a subset of PubChem into a set of ligands. Searchable by 2D structure and docking docking score
Indiana University
Freely accessible
Database 3D Structure
Provides access to 3D structure generated for most of PubChem
Indiana University
Freely accessible
Cheminformatics OSCAR3 Extract chemical structures from text
Cambridge University
Freely accessible
Cheminformatics InChiGoogle Uses Google to search for an InChI
Cambridge University
Freely accessible
Cheminformatics CMLRSSServer Generates a CMLRSS feed from CML data
Cambridge University
Freely accessible
Cheminformatics OpenBabel Converts chemical file formats
Cambridge University
Freely accesible
Sample Services
7
Cheminformatics ToxTreeServer Obtains toxicity hazard predictions
Indiana University & European Chemical Bureau
Freely accessible
Cheminformatics DBUtil Generates 166 bit MACCS keys
Indiana University & gNova Consulting
Freely accessible
Cheminformatics Molecular Similarity
Evaluates 2D/ 3D similarity and evaluate distance moments for 3D similarity calculations
Indiana University & CDK
Freely accessible
Cheminformatics Molecular Descriptors
Generatesarious descriptors including TPSA, XLogP, s urface areas
Indiana University & CDK
Freely accessible
Cheminformatics 2D Structure Diagrams
Generates 2D structure diagrams from SMILES
Indiana University & CDK
Freely accessible
Cheminformatics Druglikeness Methods
Evaluates measures of druglikeness
Indiana University & CDK
Freely accessible
Cheminformatics Utility Methods
Generates hashed fingerprints, 2D coordinate generation etc.
Indiana University & CDK
Freely accessible
8
Statistics Sampling Distributions
Samples from several distributions (normal, uniform, Weibull etc)
Indiana University
Freely accessible
Statistics Linear Regression Builds linear regression models
Indiana University
Freely accessible
Statistics CNN Regression Builds neural network regression models
Indiana University
Freely accessible
Statistics RF Regression Builds random forest regression models
Indiana University
Freely accessible
Statistics LDA Builds linear discriminant analysis models
Indiana University
Freely accessible
Statistics K-Means Performs K-means clustering
Indiana University
Freely accessible
Statistics Feature Selection
Performs feature selection using stepwise regression
Indiana University
Freely accessible
Statistics XY Plots Generates 2D scatter plots
Indiana University
Freely accessible
Statistics Histogram Plots Generates histograms
Indiana University
Freely accessible
9
Data Exchange TabToVOTables Converts tab delimited files to VOTables
Indiana University
Freely accessible
Data Exchange VOTablesToTab Converts VOTables to tab delimited files
Indiana University
Freely accessible
Data Exchange VOTablesToXLS Converts VOTables to Excel spreadsheet
Indiana University
Freely accessible
Data Exchange VOTable Retrieve
Retrieves field names and data types from a VOTables document
Indiana University
Freely accessible
Data Exchange VOTableExtract Extracts columns from a VOTables document
Indiana University
Freely accessible
Computational Chemistry
Varuna File Format
Handles file formats for QM/ MM packages
Indiana University
Freely accessible
Computational Chemistry
Varuna Analysis Performs analysis of results from Ja guar and ADF
Indiana University
Freely accessible
Computational Chemistry
Varuna Query Searches the Varuna database
Indiana University
Freely accessible
Computational Chemistry
Varuna Submit Submits input data for calculation on a local cluster
Indiana University
Freely accessible
10
Application Fred Performs docking Openeye Software
Commercial
Application Filter Property calculation and filtering
Openeye Software
Commercial
Application Omega Generates 3D conformers
Openeye Software
Commercial
Application BCI Fingerprint Generates 1052 BCI st ructural keys
Digital Chemistry
Commercial
Application BCI Clustering Performs divisive k-means clustering
Digital Chemistry
Commercial
Application PkCell
Evaluates pharmacokinetic parameters for druglike molecules
Indiana University & University of Michigan
Freely accessible
Application Scripps MLSCN Toxicity
Gets toxicity predictions for RF models built using MLSCN cell-line data
Indiana University & Scripps, FL.
Freely accessible
Application NTP DTP Anti-cancer activity
Gets anti-cancer actvity predictions for the 60 NCI cell lines
Indiana University
Freely accessible
Application Ames Mutagenicity
Gets mutagenicity predictions
Indiana University
Freely accessible
11
Name Functionality Type Links
PubDock In terface to the docking database
Web http://w ww.chembiogrid.org/ cheminfo/ dock/
Pub3D In terface to the 3D structure database
Web http://w ww.chembiogrid.org/ cheminfo/ p3d/
Frequent Hitters
Identify compounds that occur in multiple assays, with links to individual assays
Web http://w ww.chembiogrid.org/ cheminfo/f reqhit/ fh
MLSCN Toxicity Predictions
Predict whether a compound will be toxic or not
Web and Pipeline Pilot
http://w ww.chembiogrid.org/ cheminfo/rw s/s cripps
ToxTree Predict toxicity hazard class
Web http:/ / cheminfo.informatics.indiana.edu/~r guha/c ode/ java/ cdkws/ cdkws.html#tox
DTP Anti-Cancer Predictions
Predict whether a compound exhibits anti-cancer activity against the 60 NCI cell lines
Web http://w ww.chembiogrid.org/ cheminfo/n cidtp/dtp
Web Client Interfaces
12
Ames Mutagenicity Predictions
Predict whether a compound is mutagenic or not in the Ames test
Web http://w ww.chembiogrid.org/ cheminfo/rw s/ame s
PkCell Evaluate pharmacokinetic parameters
Web http://w ww.chembiogrid.org/ cheminfo/p kcell/
Kemo Natural language interface to PubChem
Web http:/ / cheminfo.informatics.indiana.edu:8080/k emo/
RSS Feeds
Generate RSS feeds for various PubChem related queries
Web and RSS feed
http://w ww.chembiogrid.org/ cheminfo/r ssint.html
Statistical Model Download
Download statistical models as R binary files
Web http://w ww.chembiogrid.org/ cheminfo/rw s/m list
Cheminformatics
Miscellaneous functions such as structure diagrams, similarity etc.
Web http:/ / cheminfo.informatics.indiana.edu/~r guha/c ode/ java/ cdkws/ cdkws.html
More Clients…
13
Varuna File operations and result analysis
Web
http:// 129.79.139.29/ filecon/D efault.aspx and http:// 129.79.139.29/ utilityclient/ Default.aspx
VOTables
Plotting data using VOTables as well as using Excel files via VOTables
Web
http:/ / gf1.ucs.indiana.edu:9080/a xis/ VOTables.html and http://w ww.chembiogrid.org/ cheminfo/rw s/ xlsvor
PubChemSR .Net interface to PubChem
Desktop application
http:/ / darwin.informatics.indiana.edu/j uhur/ To ols/ PubChemSR/
rpubchem and rcdk
R packages to interface with the CDK and access PubChem
Desktop applciation
http:/ / cran.r-project.org/ src/ contrib/ Descriptions/rcdk.html and http:/ /c ran.r-project.org/ src/ contrib/ Descriptions/rpubchem.html
Chimera plugin
A plugin to allow Chimera to utilize the PubDock database
Desktop application (requires Chimera)
http:/ / poincare.uits.iupui.edu/ ~heiland/c icc/ code/
PubChem 3D View
A Greasemonkey script that shows 3D structures when viewing Pubchem pages
Web (requires Firefox and Greasemonkey)
http:// rna.informatics.indiana.edu/ hgopalak/ 3DStructView.user.js
More Clients…
14
Example: PubDock Database of approximately 1
million PubChem structures (the most drug-like) docked into proteins taken from the PDB
Available as a web service, so structures can be accessed in your own programs, or using workflow tools like Pipeline Polit
Several interfaces developed, including one based on Chimera (right) which integrates the database with the PDB to allow browsing of compounds in different targets, or different compounds in the same target
Can be used as a tool to help understand molecular basis of activity in cellular or image based assays
15
Example: R Statistics applied to PubChem data
By exposing the R statistical package, and the Chemistry Development Kit (CDK) toolkit as web services and integrating them with PubChem, we can quickly and easily perform statistical analysis and virtual screening of PubChem assay data
Predictive models for particular screens are exposed as web services, and can be used either as simple web tools or integrated into other applications
Example uses DTP Tumor Cell Line screens - a predictive model using Random Forests in R makes predictions of probability of activity across multiple cell lines.
16
Example assay screening workflow: finding cell-protein relationships
A protein implicated in tumor growth with known ligand is selected (in this case HSP90 taken from the PDB 1Y4 complex)
Similar structures to the ligand can be
browsed using client portlets.
Once docking is complete, the user visualizes the high-scoring docked structures in a portlet using the JMOL applet.
Similar structures are filtered for drugability, are converted to 3D, and are automatically passed to the OpenEye FRED docking program for docking into the target protein.
The screening data from a cellular HTS assay is similarity searched for compounds with similar 2D structures to the ligand.
Docking results and activity patterns fed into R services for building of activity models and correlations
LeastSquaresRegression
RandomForests
NeuralNets
17
Relevance to Web 2.0Some Web 2.0 Key Features
REST Services Use of RSS/Atom feeds Client interfaces are “mashups” Gadgets, widgets for portals aggregate clients
So… We provide RSS as an alternative WS format. We have experimented with RSS feeds, using Yahoo
Pipes to manipulate multiple feeds. CICC Web interfaces can be easily wrapped as
universal gadgets in iGoogle, Netvibes. Alternative to classic science gateways.
RSS Feeds/REST ServicesProvide access to DB's via RSS feedsFeeds include 2D/3D structures in CMLViewable in Bioclipse, Jmol as well as Sage etc.Two feeds currently available
SynSearch – get structures based on full or partial chemical names
DockSearch – get best N structures for a target
Really hampered by size of DB and Postgres performance.
19
Tools and mashups based on web service infrastructure
http://www.chembiogrid.org/projects/proj_tools.html
20
Mining information from journal articles
Until now SciFinder / CAS only chemistry-aware portal into journal information
We can access full text of journal articles online (with subscription)
ACS does not make full text available … but there are ways round that!
RSC is now marking up with SMILES and GO/Goldbook terms! www.projectprospect.org
Having SMILES or InChI means that we can build a similarity/structure searchable database of papers: e.g. “find me all the papers published since 2000 which contain a structure with >90% similarity to this one”
In the absence of full text, we can at least use the abstract
21
Text Mining: OSCAR A tool for shallow, chemistry-specific natural language
parsing of chemical documents (e.g. journal articles). It identifies (or attempts to identify):
Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms.
Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections.
Other entities: Things like N(5)-C(3) and so on. Part of the larger SciBorg effort
See http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html) http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/
Oscar3
Mash-Up: What published compounds might bind to this protein?
Create a database containing thetext of all recent PubMed abstracts(2006-2007 = ~500,000)
Convert molecules to 3D and dock into
a protein of interest
Visualize top docked molecules in a Google-like interface
Use OSCAR to extract all of the chemical names referred to in the abstracts and covert to SMILES
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
DATABASE SERVICE
DOCKING SERVICE
+
23
E-Chemistry and Digital Libraries
We can’t wait to get started….
24
E-Chemistry and Digital Libraries
Key problem with our SOA-based e-Science is information management. Where is the service that I need? What does it do?
We may consider our data-centric services to be digital libraries.
Data is diverse Documents Not just computational information like structures.
Another point of view: how can I link together publications, results, workflows, etc? That is, I need to manage digital documents.
25
Digital Libraries Open Archives Initiative Object Reuse and Exchange
Project (OAI-ORE) Developing standardized, interoperable, and machine-
readable mechanisms to express information about compound information objects on the web.
Graph-based representations of connected digital objects. Objects may be encoded in (for example) RDF or XML, Retrievable via repositories with REST service interfaces
(c.f. Atom Publishing Protocal) Obtain, harvest, and register
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
28
Challenges for E-Chemistry
Can digital library principals be applied to data as well as documents? Can you link your workflow to your conference paper?
Can we engineer a publishing framework and message formats around Web 2.0 principals? REST, Atom Publishing Protocol, Atom Syndication
Format, JSON, Microformats
Can we do this securely? Access control, provenance, identify federation are key
problems.
Institution Project Focus
Cambridge Retrospective Data Extraction Searching and Indexing Data Models/Ontologies Tools and Applications
Cornell Data Models Interoperability infrastructure Project Management Publicity and outreach
Indiana Infrastructure Integration Trust and Provenance Tools and Applications
LANL Data Models
Interoperability infrastructure
PuBChem Chemical Structure Archive
Results of Experimental Biological Activity Testing
Cross References to BioMedical Databases
Penn State Retrospective Data Extraction Searching and Indexing Analysis
Southampton Prospective & Retrospective Data Provision Tools and Applications In-process capture of eChemistry data Data Linking Š in analysis and publication
30
More Information
Project Web Site: www.chembiogrid.orgProject Wiki: www.chembiogrid.org/wikiContact me: [email protected]
31
CICC Combines Grid Computing with Chemical Informatics
CICCCICC CICCCICCChemical Informatics and Cyberinfrastucture Collaboratory
Funded by the National Institutes of Healthwww.chembiogrid.org
Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
Science and Cyberinfrastructure
.
Large Scale Computing Challenges
Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.
CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.
CICC supports the NIH mission by combining state of the art chemical informatics techniques with
• World class high performance computing• National-scale computing resources (TeraGrid)• Internet-standard web services • International activities for service orchestration• Open distributed computing infrastructure for scientists world wide
NIHPubMed
DataBase
OSCARText
Analysis
POVRayParallel
Rendering
Initial 3DStructure
Calculation
ToxicityFiltering
ClusterGrouping
Docking
MolecularMechanics
Calculations
Quantum Mechanics
Calculations
IU’sVaruna
DataBase
NIHPubChemDataBase
Chemical informatics text analysis programs can process 100,000’s of abstracts of online journalarticles to extract chemical signatures of potential drugs.
OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.
Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.
MLSCN Post-HTS Biology Decision SupportPercent Inhibition
or IC50 data is retrieved from HTS
Question: Was this screen successful?
Question: What should the active/inactive cutoffs be?
Question: What can we learn about the target protein or cell line from this screen?
Compounds submitted to PubChem
Workflows encoding distribution analysis of screening results
Grids can link data analysis ( e.g image processing developed in existing Grids), traditional Chem-informatics tools, as well as annotation tools (Semantic Web, del.icio.us) and enhance lead ID and SAR analysis
A Grid of Grids linking collections of services atPubChemECCR centersMLSCN centers
Workflows encoding plate & control well statistics, distribution analysis, etc
Workflows encoding statistical comparison of results to similar screens, docking of compounds into proteins to correlate binding, with activity, literature search of active compounds, etcCHEMINFORMATICSPROCESS GRIDS
34
R Web Services
35
Why?
Need access to math and stat functionality
Did not want to recode algorithmsWanted latest methodsNeeded a distributed approach to
computation Keep computation on a powerful machine Access it from a smaller machine
36
Why R?
Free, open-sourceMany cutting edge methods avilableFlexible programming languageInterfaces with many languages
Python Perl Java C
37
The R Server
R can be run as a remote compute server Requires the rserve package
Allows authenticated access over TCP/IP
Connections can maintain stateClient libraries for Java & C
38
R as a Web Service
On its own the R server is not a web service
We provide Java frontends to specific functionalities
The frontend classes are hosted in a Tomcat web container
Accessible via SOAPFull Javadocs for all available WS’s
39
Flowchart
40
Functionality
Two classes of functionality General functions
Allows you to supply data and build a predictive model
Sample from various distributions Obtain scatter plots and hisotgram Model development functions use a Java front-
end to encapsulate model specific information
41
Functionality
Two classes of functionality Model deployment
Allows you to build a model outside of the infrastructure
Place the final model in the infrastructure Becomes available as a web service Each model deployed requires its own front end
class In general, these classes are identical - could
be autogenerated
42
Available Functionality
Predictive models - OLS, RF, CNN, LDA
Clustering - k-means Statistical distributionsXY plot and scatter plotsModel deployment for single model
types and ensemble model types
43
Deployed Models
Since deployed models are visible as web services we can build a simple web front end for them
Examples NCI anti-cancer predictions Ames mutagenicity predictions
44
Applications
The R WS is not restricted to ‘atomic’ functionality
Can write a whole R program Load it on the R compute server Provide a Java WS frontend
Examples Feature selection Automated model generation Pharmacokinetic parameter calculation
45
Data Input/Output
Most modeling applications require data matrices
Depending on client language we can use SOAP array of arrays (2D matrices) SOAP array (1D vector form of a 2D
matrix) VOTables
46
Data Input/Output
Some R web services can take a URL to a VOTables document Conversion to R or Java matrices is done
by a local VOTables Java library
R also has basic support for VOTables directly Ignores binary data streams
47
Interacting With R WS’s
Traditional WS’s do not maintain statePredictive models are different
A model is built at one time May be used for prediction at another time Need to maintain state
State is maintained by serialization to R binary files on the compute server
Clients deal with model ID’s
48
Interacting with R WS’s
Protocol Send data to model WS Get back model ID Get various information via model ID
Fitted values Training statistics New predictions
49
Cheminformatics at Indiana University School of
Informatics
David J. [email protected]
Associate Director of Chemical Informatics & Assistant Professor
Indiana University School of Informatics, Bloomington
http://djwild.info
50
Cheminformatics education at Indiana
M.S. in Chemical Informatics 2 years, 36 semester hours Includes a 6-hour capstone / research project Opportunity to work in Laboratory Informatics (IUPUI) or closely with Bioinformatics (IUB) Currently 9 students enrolled
Ph.D. in Informatics, Cheminformatics Specialty 90 credit hours, including 30 hours dissertation research. Usually 4 years. Research rotations expose students to research in related areas Currently 4 students enrolled
Graduate Certificate 4 courses, all available by Distance Education
I571 Chemical Information Technology I572 Computational Chemistry & Molecular Modeling I573 Programming for Science Informatics I553 Independent Study in Chemical Informatics
D.E. students pay in-state fees! (~$800 per class) See http://cheminfo.informatics.indiana.edu for more information, or a general review
of cheminformatics education in Drug Discovery Today 11, 9&10 (May 2006), pp436-439
51
Distance Education for Cheminformatics
Uses Breeze + teleconference for live sharing of classes: all that is required is a P.C. and a telephone. Optional Polycom videoconferencing.
Lectures are recorded for easy playback through a web browser
Wiki or similar webpage for dissemination of course materials
Also participate in CIC courseshare to give class at University of Michigan
Of 75 students taking our courses since fall 2005, 39 have been D.E. students
See JCIM 2006; 46(2) pp 495 - 502 for more details
52
Current research in the Wild lab
Integration of cheminformatics tools and data sources A web service infrastructure for cheminformatics Compound information & aggregation web service and interface (“by the way box”) An enhanced chatbot for exploting chemical information & web services A semantically-aware workflow tools for cheminformatics Data mining the NIH DTP tumor cell line database PubDock: a docking database for PubChem
Aggregating life science information from web and journal documents Data mining semantically rich chemistry journal articles Document similarity based on chemical structure similarity Evaluating semantic markup of chemistry journal articles
Integrating cheminformatics into the chemistry lab Integrating cheminformatics with the Second Life virtual world Integrating cheminformatics tools with electronic lab notebooks Usability of cheminformatics tools
53
Current research in the Guha lab
Predictive Modeling Interpretation, validation, domain applicability Generalization to other ‘models’ such as docking, pharmacophore etc Integration of multiple data types Addressing imbalanced and noisy datasets
Analysis of Chemical Spaces Quantify distributions in spaces Investigation of density approaches Applications to lead hopping, model domains
Methods to summarize & compare data Applications to HTS and smaller lead series type datasets
Network models combining chemical structures and biological systemsSoftware and infrastructure
Model exchange and annotation Pharmacophore representations, matching Toolkit development (CDK)
54
Cheminformatics web service infrastructure
Database ServicesPostgreSQL + gNovaPubChem mirror (augmented)Pub3D - 3D structures for PubChemPubDock - Bound 3D structuresCompound-indexed journal article DBNIH Human Tumor Cell LineLocal PubChem mirrorVARUNA quantum chemistry database
Statistics (based on R)Regression, LDANeural Nets, Random ForestK-means clusteringPlottingT-test and distribution sampling
Cheminformatics servicesDocking (FRED)3D structure generation (OMEGA)Filtering (FRED, etc)OSCAR3Fingerprints (BCI, CDK)Clustering (BCI)Toxicity prediction (ToxTree)R-based predictive modelsSimilarity calculations (CDK)Descriptor calculation (CDK)2D structure diagrams (CDK)
Xiao Dong, Kevin E. Gilbert, Rajarshi Guha, Randy Heiland, Jungkee Kim, Marlon E. Pierce, Geoffrey C. Fox and David J. Wild, Web service infrastructure for chemoinformatics, Journal of Chemical Information
and Modeling, 2007; 47(4) pp 1303-1307
55
RSC Project Prospect - what can we do with the
information?www.projectprospect.org>100 papers marked up with SMILES/InChI (using
OSCAR3), plus Gene Ontology and Goldbook Ontology terms
Created similarity searchable PostgreSQL / gNova database with paper DOIs, SMILES, and ontology terms
Web service and simple HTML interfaces for searching … “which papers reference compounds similar to this one in the scope of these ontological terms?”
Applying statistics to look at co-occurrence of compounds, structural features (MACCS keys) and ontological terms in papers
56
Greasemonkey / OSCAR script
http://cheminfo.informatics.indiana.edu:8080/ChemGM/index.jsp
57
By the way… annotation (mock-up!)
By the way…
This compounds is very similar to a prescription drug, Tamoxifen.
This compound is referenced in 20 journal articles published in the last 5 years
Similar compounds are associated with the words “toxic” and “death” in 280 web pages
It appears to be covered under 3 patents
It has been shown to be active in 5 screens
Computer models predict it to show some activity against 8 protein targets
Here are some comments on this compound:
David Wild: don’t take any notice of the computational models - they are rubbish
58
Some useful chemical reactions
Iodoacetate a Iodoacetamide I-CH4COO- ICH2CONH2
This may also react, chem favored by alkaline pH
….
Cheminformatics aware simple lab notebook
(mock up!)
Free text input can beconverted to machine
readable form byelectrovaya
Automatic detection ofdata fields (yield, etc)
Where possible
Plug-in allows structuresto be drawn with
the pen and cleaned up
Web service interfaceprovides access to
computation and searching.Page is marked up by what
is possible
FIND INFO ABOUT THIS REACTION
S H2C C
OH
O
S
C
O
OH
I+ +
59
Automatic workflow generation and natural
language queriesDevelop service ontology using OWL-S or similar language Allows service interoperability, replacement and input/outut
compatibilityWe can then use generic reasoning and network analysis
tools to find paths from inputs to desired outputsNatural language can be parsed to inputs and desired
outputsSmart Clients <--> Agents <--> ServicesPossible “supercharged life science Google?” - e.g. type
in “what compounds might bind to the enclosed protein?”
2D -> 3D
2Dstructurecrawler
dock
3D search
P’phoresearch
2dsimilarity
2D structures
2D structures3D structures
3D structures3D structures & complexes
dock = bind
3D proteinstructure
result3D structures are
compounds
2D structures arecompounds
3D structures arecompounds