Chemical Information Retrieval Class 1
-
Upload
jean-claude-bradley -
Category
Documents
-
view
444 -
download
0
description
Transcript of Chemical Information Retrieval Class 1
Chemical Information Retrieval 2012
Jean-Claude Bradley
September 28, 2012
First Class
Associate Professor of ChemistryDrexel University
CHEM367/767 Drexel University
Finding reliable chemical information
can be really hard
After this class,you should feel that
you can never blindly trust
chemical data sources again
But…You will learn how to do the best you can
with imperfect information
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer
(Andrew Lang)
OutliersMDPI
datasetEPI (donated all data to public
also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C
What is the melting point of 4-benzyltoluene?
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp
and can be frozen <-30C
Open Lab Notebook page measuring the melting point of 4-benzyltoluene
Motivation: Faster Science, Better Science
Ruling out all melting points above -15C?
Oops – 4-benzyltoluene freezes after 16 days at -15C!
Measuring the melting point by slowly heating from -15 C gives 5 C
There are NO FACTS, only measurements embedded
within assumptions
Open Notebook Science maintains the integrity of data
provenance by making assumptions explicit
“Simple” aldol condensation synthesis
Top Hit(no reports of synthesis)
In top ten(a few reports of synthesis)(Andrew Lang)
Information from the literature on the target synthesis
Information from the literature on the target synthesis
An example of a “failed experiment” in an Open Notebook with useful
information
A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction
time
Open Random Forest modeling of Open Melting Point data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Web services for summary data
(Andrew Lang)
Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis
Calling Google App Scripts
Calling Google App Scripts
(Andrew Lang and Rich Apodaca)
Google Apps Scripts for conveniently exploring melting
point data
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with triple validated measurements
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
Google Apps Scripts web services
Integration of Multiple Web Services to Recommend Solvents
for Reactions
(Andrew Lang)
What are good solvents to recrystallize benzoic acid?
(Andrew Lang)
Click on the solvent to see temp curve
(Andrew Lang)
Deliver melting point data via App
(Andrew Lang)
Web services from data collected in this class will be added here
In this class you will learn
How to search Science1.0 resources
•Peer-Reviewed journals•Commercial databases•Patents•Conference Proceedings
In this class you will learn
How to participate in Science2.0
•wikis (Wikipedia, class wiki)•blogs•interactive databases (ChemSpider)•social software (Twitter, FriendFeed)
In this class you will learnHow to leverage Science3.0
(via collaboration with Andrew Lang)
•machine readable web-services
Now lets take a look at the class wiki