NBCC Open Notebook Science Talk

70
Accelerating Discovery by Sharing: a case for Open Notebook Science Jean-Claude Bradley May 1, 2011 National Breast Cancer Coalition Annual Advocacy Conference Associate Professor of Chemistry Drexel University

description

Jean-Claude Bradley presents "Accelerating Discovery by Sharing: a case for Open Notebook Science" at the National Breast Cancer Coalition Annual Advocacy Conference in Arlington, VA on May 1, 2011.

Transcript of NBCC Open Notebook Science Talk

Page 1: NBCC Open Notebook Science Talk

Accelerating Discovery by Sharing: a case for Open

Notebook Science

Jean-Claude Bradley

May 1, 2011

National Breast Cancer Coalition Annual Advocacy Conference

Associate Professor of ChemistryDrexel University

Page 2: NBCC Open Notebook Science Talk

Outline

1. Trends in sharing for drug discovery

2. ONS for malaria research3. Crowdsourcing solubility with

ONS4. Leveraging the educational

system to contribute new science

5. Open modeling and web services

6. Discovering connections7. Moving forward: tools and

practices

Page 3: NBCC Open Notebook Science Talk

Industry is Sharing More

Page 4: NBCC Open Notebook Science Talk

Opportunities for Competitive Collaboration

Page 5: NBCC Open Notebook Science Talk

Some Initiatives Promoting More Openness in Drug Discovery

Page 6: NBCC Open Notebook Science Talk

Motivation: Faster Science, Better Science

Page 7: NBCC Open Notebook Science Talk

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 8: NBCC Open Notebook Science Talk

TRUST

PROOF

Page 9: NBCC Open Notebook Science Talk

First record then abstract structure

In order to be discoverable use Google friendly formats (simple HTML, no

login) In order to be replicable use free hosted tools (Wikispaces, Google

Spreadsheets)

Strategy for an Open Notebook:

Page 10: NBCC Open Notebook Science Talk

UsefulChem Project:UsefulChem Project: Open Primary Open Primary Research in Drug Design using Web2.0 Research in Drug Design using Web2.0

toolstools

Docking

Synthesis

Testing

Rajarshi GuhaIndiana U

JC BradleyDrexel U

Phil RosenthalUCSF

(malaria)

Dan ZaharevitzNCI

(tumors)

Tsu-Soo TanNanyang Inst.

Page 11: NBCC Open Notebook Science Talk

Malaria Target: falcipain-2 involved in hemoglobin metabolism

Dana.org

Page 12: NBCC Open Notebook Science Talk

The Ugi Reaction

Page 13: NBCC Open Notebook Science Talk

Outcome of Guha-Bradley-Outcome of Guha-Bradley-Rosenthal collaborationRosenthal collaboration

Page 14: NBCC Open Notebook Science Talk

References to papers, blog posts, lab notebook pages, raw

data

Page 15: NBCC Open Notebook Science Talk

The Ugi reaction: can we predict precipitation?

Can we predict solubility in organic solvents?

Page 16: NBCC Open Notebook Science Talk

Crowdsourcing Solubility Data

Page 17: NBCC Open Notebook Science Talk

ONS Challenge Judges

Page 18: NBCC Open Notebook Science Talk

ONS Challenge Award Winners

Page 19: NBCC Open Notebook Science Talk

Solubilities collected in a Google Spreadsheet

Page 20: NBCC Open Notebook Science Talk

Rajarshi Guha’s Live Web Query using Google Viz API

Page 21: NBCC Open Notebook Science Talk

Data provenance: From Wikipedia to…

Page 22: NBCC Open Notebook Science Talk

…the lab notebook and raw data

Page 23: NBCC Open Notebook Science Talk

Interactive NMR spectra using JSpecView and JCAMP-DX

Page 24: NBCC Open Notebook Science Talk

(Andy Lang, Tony Williams)

Open Data JCAMP spectra for education

(Andy Lang, Tony Williams, Robert Lancashire)

Page 25: NBCC Open Notebook Science Talk

Raw Data As Images

Splatter?

Some liquid

Page 26: NBCC Open Notebook Science Talk

YouTube for demonstrating experimental YouTube for demonstrating experimental set-upset-up

Page 27: NBCC Open Notebook Science Talk

The importance of raw data availability

Missed in a prior publication on

solubility for this compound

Page 28: NBCC Open Notebook Science Talk

Case study: Chemical Information

Retrieval course at Drexel (Fall 2009/2010)

Leveraging the educational system to contribute new science

Page 29: NBCC Open Notebook Science Talk

The Chemical Information Validation Sheet

567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Page 30: NBCC Open Notebook Science Talk

The Chemical Information Validation Explorer

(Andrew Lang)

Page 31: NBCC Open Notebook Science Talk

Discovering outliers for melting points (stdev/average)

Page 32: NBCC Open Notebook Science Talk

Investigating the m.p. inconsistencies of EGCG

Page 33: NBCC Open Notebook Science Talk

Investigating the m.p. inconsistencies of cyclohexanone

Page 34: NBCC Open Notebook Science Talk

Sigma-Aldrich, Acros and Wolfram Alpha apparently use the same sources for melting

points

Page 35: NBCC Open Notebook Science Talk

Sigma-Aldrich, Acros and Wolfram Alpha apparently use the same sources for boiling

points

Page 36: NBCC Open Notebook Science Talk

Sigma-Aldrich, Acros and Wolfram Alpha apparently

DO NOT use the same sources for flash points

Page 37: NBCC Open Notebook Science Talk

Most popular data sources

Page 38: NBCC Open Notebook Science Talk

Alfa Aesar donates melting points to the public

Page 39: NBCC Open Notebook Science Talk

Open Melting Point Explorer

Page 40: NBCC Open Notebook Science Talk

Outliers

MDPI dataset

EPI (via ChemSpider)

Page 41: NBCC Open Notebook Science Talk

Outliers

Alfa Aesar

Page 42: NBCC Open Notebook Science Talk

Inconsistencies and SMILES problems within MDPI dataset

Page 43: NBCC Open Notebook Science Talk

MDPI Dataset labeled with High Trust Level

Page 44: NBCC Open Notebook Science Talk

Open Melting Point Datasets

Page 45: NBCC Open Notebook Science Talk

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 46: NBCC Open Notebook Science Talk

Melting point prediction service

Page 47: NBCC Open Notebook Science Talk

Other Web Services…

(Andrew Lang)

General Transparent Solubility Prediction

Page 48: NBCC Open Notebook Science Talk

Convenient web services for solubility measurement and

prediction

(Andrew Lang)

Page 49: NBCC Open Notebook Science Talk

Integration of Multiple Web Services to Recommend Solvents

for Reactions

(Andrew Lang)

Page 50: NBCC Open Notebook Science Talk

Using melting point for temperature dependent solubility prediction

Page 51: NBCC Open Notebook Science Talk

Semi-Automated Semi-Automated Measurement of solubility via Measurement of solubility via

web service analysis of web service analysis of JCAMP-DX files JCAMP-DX files

(Andy Lang)(Andy Lang)

Page 52: NBCC Open Notebook Science Talk
Page 53: NBCC Open Notebook Science Talk
Page 54: NBCC Open Notebook Science Talk

Solubility Prediction (Andy Lang uses Abraham Model)

Page 55: NBCC Open Notebook Science Talk

Reaction Attempts Book

Page 56: NBCC Open Notebook Science Talk

Reaction Attempts Book: Reactants listed Alphabetically

Page 57: NBCC Open Notebook Science Talk
Page 58: NBCC Open Notebook Science Talk

Dynamic links to private tagged Mendeley collections

(Andrew Lang)

Page 59: NBCC Open Notebook Science Talk

All ONS web services

Page 60: NBCC Open Notebook Science Talk

For all Formats of ONS Projects

Page 61: NBCC Open Notebook Science Talk

ONS Challenge Solubility Book cited for nanotechnology

application

Page 62: NBCC Open Notebook Science Talk

Visualizing molecule-researcher connection maps reveals link between 2 Open Notebooks (Todd

and Bradley)

(Don Pellegrino)

Page 63: NBCC Open Notebook Science Talk

The Intersection of Open Notebooks (Bradley/Todd) and IP implications

Open Notebook could have blocked patent

if done earlier

Page 64: NBCC Open Notebook Science Talk

Decanoic acid

WaterNaCl

Page 65: NBCC Open Notebook Science Talk

Phrase searching for useful solubility applications

Page 66: NBCC Open Notebook Science Talk

Search for applications of solubility for breast cancer research

Page 67: NBCC Open Notebook Science Talk

Solubility prediction for Taxol using Abraham descriptors

Pred Exp

Page 68: NBCC Open Notebook Science Talk

Predicted temperature dependent solubility of Taxol in water (M)

Page 69: NBCC Open Notebook Science Talk

Current research questions for Taxol solubility

1. Does Taxol have a meaningful solubility in methanol or does it decompose too quickly?

2. Why is methanol reported to decompose Taxol but not ethanol?

3. Can the solubility of Taxol in solvent mixtures be predicted, especially for approved excipients?

4. Can the solubility of Taxol analogs be used to create reliable models for the solubility of this class of compounds?

Page 70: NBCC Open Notebook Science Talk

Moving Forward: Tools and Practices

Use free hosted web tools and open data formats

1. Google Spreadsheets (numerical data)2. Wikispaces (human readable format)3. YouTube, SlideShare, LuLu, Nature Precedings,

etc. (multiple data formats)4. JCAMP-DX for spectral data

Practices1. Report all findings immediately – even if tentative2. Participate in social media to share progress and

find collaborators3. Abstract experiments and findings to machine

readable formats and make these easily discoverable