2012 ACS Skolnik Symposium - ChemSpotlight

Post on 17-May-2015

1.445 views 1 download

Tags:

Transcript of 2012 ACS Skolnik Symposium - ChemSpotlight

Automated Molecular Data Extraction using Open Babel & ChemSpotlight:

The Semantic Desktop

Prof. Geoff HutchisonDepartment of ChemistryUniversity of Pittsburghgeoffh@pitt.edu

ACS CINF: Skolnik Symposium21 August 2012

http://hutchison.chem.pitt.edu

”— Prof. Henry S. Rzepa (Imperial College) Spring 2005 ACS Meeting, San Diego, CA

I can plug my iPod into any computer and it will recognize my music and give me all sorts of metadata: artist, title, type of music...

Why can’t I read the chemical metadata off my chemistry files?

Pre-History: Chem://Dig

Index files, websites

Based on Chem MIME

Find files on extension

Perceive chemistry

Database Store

Search, Filter

Retrieval

H. Rzepa et al. New J. Chem (2002) 26 p. 656

Open Babel

Open Babel (Started 2001)

http://openbabel.org/

Free, open source chemical toolbox

Cross-platform: Win, Mac, Linux...

Both user-tools & C++ library

Interfaces in Python, Perl, Ruby, Java, C#

Supports chemistry, bioinformatics, solid-state…

100+ file formats and variants

O’Boyle et al. J. Cheminf. 2011, 3:33

Chemical Database?

1. Some way to store data (Organize it)

2. Index it3. Search / filter4. Visualize results

ChemSpotlight: Indexing Architecture

Spotlight Open Babel

+ + ~300 lines of code

http://chemspotlight.openmolecules.net/

ChemSpotlight: “Un” Database

Use the system-wide search databaseNo (Visible) Database!

Index files in-place

Includes textual data(e.g., chemical names, formulas, etc.)

Multiple retrieval and filtering interfaces(i.e., any third-party search tool works)

http://chemspotlight.openmolecules.net/

So What’s Stored / Perceived

Formula, mass, SMILES, InChInet_sourceforge_openbabel_Formula = C21H36N7O8S

Fingerprints, number of atoms, bonds, residues

PDB, SDF keywords, properties

Calculation keywords:kMDItemComment = "Gaussian 09 #n B3LYP/6-31G(d) Opt"

Calculation results (HOMO, LUMO, Dipole Moment)net_sourceforge_chemspotlight_DipoleMoment = 3.5

ChemSpotlight “Un” Database

ChemSpotlight “Un” Database

How Do We Visualize?

“QuickLook” previews

New code ~800 lines

Generate SDF, PDB, CIF (if needed)

Pass off to ChemDoodleWeb Components

Pseudo-3D, interactive JS+ HTML5

… or SVG generation from Open Babel

http://web.chemdoodle.com/

Organic Heterojunction Solar Cells

p-type material

n-type material

Transparent Electrode

Reflective Electrode

light

+- Circuit

ΔE ≥ Exciton Binding Energy e-

h+

Optical Excitation

Anode

Cathode

Effective

Heterojunction

Bandgap

Hole

Conducting

PolymerElectron

Conductor(Nanoparticle)

Organic Heterojunction Solar Cells

p-type material

n-type material

Transparent Electrode

Reflective Electrode

light

+- Circuit

Pipeline Model for Finding New Molecules

Monomers

...

>106

Possible Structures

ElectronicProperties

OpticalProperties

SyntheticScore

~9 m

inut

esJ Phys Chem C 2011 vol. 115 pp. 16200

Pipeline Model for Finding New Molecules

Monomers

Fast Screening

Slower

...

>106

Possible Structures

ElectronicProperties

OpticalProperties

SyntheticScore

~9 m

inut

esJ Phys Chem C 2011 vol. 115 pp. 16200

New Genetic Algorithm Approach

Rather than directly driving & wait for calc results

Check Spotlight for new results

“What are top HOMO energies?”

Update GA, generate new candidates, submit new jobs

Scaling Up the Polymer Solar Search

LUM

O E

nerg

y (e

V)

−3

−2

−1

0

HOMO Energy (eV)−9.5 −9.0 −8.5 −8.0 −7.5 −7.0 −6.5

2nd Gen. Search:

680 Monomers

2800+ Fragments

Search Space:500+ million oligomers

~9 minutes per core

S

Take-Home Messages

“Big Data” is a Big HeadacheChemSpotlight & Un-Databases Work!Keep data as native files w/separate indexIntegrate into user-friendly toolsSell to users: “What’s in it for me?”

Indexing, retrievalImproved workflows

Dr. Noel O’BoyleU.C. Cork, Ireland

Casey CampbellPitt (2010)

Marcus HanwellPitt / Kitware