Chemical Information Retrieval Class 1

49
Chemical Information Retrieval 2012 Jean-Claude Bradley September 28, 2012 First Class Associate Professor of Chemistry Drexel University CHEM367/767 Drexel University

description

Jean-Claude Bradley presents the first lecture of Chemical Information Retrieval in the Fall of 2012 at Drexel University.

Transcript of Chemical Information Retrieval Class 1

Page 1: Chemical Information Retrieval Class 1

Chemical Information Retrieval 2012

Jean-Claude Bradley

September 28, 2012

First Class

Associate Professor of ChemistryDrexel University

CHEM367/767 Drexel University

Page 2: Chemical Information Retrieval Class 1

Finding reliable chemical information

can be really hard

Page 3: Chemical Information Retrieval Class 1

After this class,you should feel that

you can never blindly trust

chemical data sources again

Page 4: Chemical Information Retrieval Class 1

But…You will learn how to do the best you can

with imperfect information

Page 5: Chemical Information Retrieval Class 1

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 6: Chemical Information Retrieval Class 1

Discovering outliers for melting points (stdev/average)

Page 7: Chemical Information Retrieval Class 1

Investigating the m.p. inconsistencies of EGCG

Page 8: Chemical Information Retrieval Class 1

Investigating the m.p. inconsistencies of cyclohexanone

Page 9: Chemical Information Retrieval Class 1

Most popular data sources

Page 10: Chemical Information Retrieval Class 1

Alfa Aesar donates melting points to the public

Page 11: Chemical Information Retrieval Class 1

Open Melting Point Explorer

(Andrew Lang)

Page 12: Chemical Information Retrieval Class 1

OutliersMDPI

datasetEPI (donated all data to public

also)

Page 13: Chemical Information Retrieval Class 1

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Page 14: Chemical Information Retrieval Class 1

Inconsistencies and SMILES problems within MDPI dataset

Page 15: Chemical Information Retrieval Class 1

MDPI Dataset labeled with High Trust Level

Page 16: Chemical Information Retrieval Class 1

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 17: Chemical Information Retrieval Class 1

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 18: Chemical Information Retrieval Class 1

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 19: Chemical Information Retrieval Class 1

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 20: Chemical Information Retrieval Class 1

Motivation: Faster Science, Better Science

Page 21: Chemical Information Retrieval Class 1

Ruling out all melting points above -15C?

Page 22: Chemical Information Retrieval Class 1

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 23: Chemical Information Retrieval Class 1

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 24: Chemical Information Retrieval Class 1

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 25: Chemical Information Retrieval Class 1

“Simple” aldol condensation synthesis

Top Hit(no reports of synthesis)

In top ten(a few reports of synthesis)(Andrew Lang)

Page 26: Chemical Information Retrieval Class 1

Information from the literature on the target synthesis

Page 27: Chemical Information Retrieval Class 1

Information from the literature on the target synthesis

Page 28: Chemical Information Retrieval Class 1

An example of a “failed experiment” in an Open Notebook with useful

information

Page 29: Chemical Information Retrieval Class 1

A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction

time

Page 30: Chemical Information Retrieval Class 1

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 31: Chemical Information Retrieval Class 1

Melting point prediction service

Page 32: Chemical Information Retrieval Class 1

Web services for summary data

(Andrew Lang)

Page 33: Chemical Information Retrieval Class 1

Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis

Page 34: Chemical Information Retrieval Class 1

Calling Google App Scripts

Page 35: Chemical Information Retrieval Class 1

Calling Google App Scripts

(Andrew Lang and Rich Apodaca)

Page 36: Chemical Information Retrieval Class 1

Google Apps Scripts for conveniently exploring melting

point data

Page 37: Chemical Information Retrieval Class 1

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 38: Chemical Information Retrieval Class 1

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 39: Chemical Information Retrieval Class 1

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 40: Chemical Information Retrieval Class 1

Google Apps Scripts web services

Page 41: Chemical Information Retrieval Class 1

Integration of Multiple Web Services to Recommend Solvents

for Reactions

(Andrew Lang)

Page 42: Chemical Information Retrieval Class 1

What are good solvents to recrystallize benzoic acid?

(Andrew Lang)

Page 43: Chemical Information Retrieval Class 1

Click on the solvent to see temp curve

(Andrew Lang)

Page 44: Chemical Information Retrieval Class 1

Deliver melting point data via App

(Andrew Lang)

Page 45: Chemical Information Retrieval Class 1

Web services from data collected in this class will be added here

Page 46: Chemical Information Retrieval Class 1

In this class you will learn

How to search Science1.0 resources

•Peer-Reviewed journals•Commercial databases•Patents•Conference Proceedings

Page 47: Chemical Information Retrieval Class 1

In this class you will learn

How to participate in Science2.0

•wikis (Wikipedia, class wiki)•blogs•interactive databases (ChemSpider)•social software (Twitter, FriendFeed)

Page 48: Chemical Information Retrieval Class 1

In this class you will learnHow to leverage Science3.0

(via collaboration with Andrew Lang)

•machine readable web-services

Page 49: Chemical Information Retrieval Class 1

Now lets take a look at the class wiki