Cheminformatics and Chemical...

32
Cheminformatics and Chemical Information Matt Sundling Advisor: Prof. Curt Breneman Department of Chemistry and Chemical Biology/Center for Biotechnology and Interdisciplinary Studies Rensselaer Polytechnic Institute

Transcript of Cheminformatics and Chemical...

Page 1: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Cheminformatics and Chemical Information

Matt Sundling

Advisor: Prof. Curt BrenemanDepartment of Chemistry and Chemical Biology/Center

for Biotechnology and Interdisciplinary StudiesRensselaer Polytechnic Institute

Page 2: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Rensselaer Exploratory Center for Cheminformatics Research

http://reccr.chem.rpi.edu/

Many thanks to Theresa Hepburn!

Page 3: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Upcoming lecture plug:

Prof. Curt Breneman

DSES Department

November 15th

Advances in Cheminformatics: Advances in Cheminformatics: Applications in Applications in Biotechnology, Drug Design and Biotechnology, Drug Design and BioseparationsBioseparations

Page 4: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Cheminformatics is about collecting, storing, and analyzing [usually large amounts of] chemical data.

•Pharmaceutical research

•Materials design

•Computational/Automated techniques for analysis

•Virtual high-throughput screening (VHTS)

QSAR - quantitative structure-activity relationship

MolecularStructures

Model Activity

Page 5: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

NN

Cl

O

AAACCTCATAGGAAGCATACCAGGAATTACATCA…

MolecularStructures

Model Activity

Page 6: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

NN

Cl

O

AAACCTCATAGGAAGCATACCAGGAATTACATCA… Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

MolecularStructures

Descriptors Model Activity

Page 7: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Acitivty: bioactivity, ADME/Tox evaluation, hERGchannel effects, p-456 isozyme inhibition, anti-malarial efficacy, etc…

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

+Activity

Molecule D1 D2 … Activity (IC50)

molecule #1 21 0.1

molecule #2 33 2.1

molecule #3 10 0.9

=

MolecularStructures

Descriptors Model Activity

Page 8: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

( ( ))i ixy f−∑Goal: Minimize the error:

f1(x)

f2(x)

Descriptors ModelMolecularStructures

Regression Models: linear, multi-linear, higher order functions, neural networks, etc…

Activity

Page 9: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTORS:

Descriptors Model ActivityMolecularStructures

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Constitutional Descriptors

Electrostatic Descriptors

Quantum-chemical Descriptors

Thermodynamic Descriptors

Page 10: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

QSAR (of birds) --> Quantitative Bird Structure-Property Relationship (QBSPR):

Bird Species

Descriptors Model Activity

Page 11: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

QSAR (of birds):

QBSPR: I want to understand the relationship between species and flight performance…

QSAR: I want to understand the relationship between compound

and Acetylcholinesterase (AChE) inhibition…

Bird Species

Descriptors Model Activity

Page 12: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

QSAR (of birds):

QBSPR: What is flight performance? Performance = muscle efficiency * flight time as

(P = Efficiency*Time)

QSAR: What is AChE inhibition? IC50 is a measure of the

concentration required for 50% inhibition.

Bird Species

Descriptors Model Activity

Page 13: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

QSAR (of birds):

QBSPR: What features or characteristics are related to what we want to predict? What differences between bird species affect what we are trying to determine (flight performance)?

QSAR: What features or characteristics of a molecule,

affects it’s ability to inhibit AChE?

Bird Species

Descriptors Model Activity

Page 14: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTORS (of birds):

Descriptors Model ActivityBird Species

Page 15: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTORS (of birds):

Bird Descriptors:

•Height, weight, ‘size’

•Color, Shape of beak, length of talons

•Bone structure

•Muscle structure

•Biokinetics, energetics, wake structure

MODEL:

P ∝ feather length & weight

Bird Species

Descriptors Model Activity

Page 16: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTORS (of birds):

MODEL:

P ∝ feather length & weight

BETTER MODEL:

P ∝ (wing span)2 / wing surface area &

body mass / wing surface area

*(wing surface area) is a latent descriptor. Latent = dormant, potential or hidden

QUESTIONS: Does your model make sense? What does it say about performance (P)? How does ‘conformation’ affect your model?

Bird Species

Descriptors Model Activity

Page 17: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

QSAR (of molecules):

What features of a molecule are related to my activity? What descriptors can capture that

information?

Molecular Structures

Descriptors Model Activity

Page 18: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTORS:

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Constitutional Descriptors

Electrostatic Descriptors

Quantum-chemical Descriptors

Thermodynamic Descriptors

•Analysis requires appropriate data

•Analysis requires relevant data

•Hierarchy of descriptors (data content)

•Same descriptors may not work in all situations (e.g. molecular weight predicts freezing point depression)

∆Tf = Kf * m

Descriptors Model ActivityMolecularStructures

Page 19: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

DESCRIPTOR HIERARCHY:

•Hierarchy of descriptors (data content)

Molecular formula

‘2D descriptors’ (e.g. connectivity matrices)

‘3D descriptors’ (e.g. sterio-chemical descriptors)

Wave function of system or PE hypersurface

INFO

RM

ATI

ON

CO

NTE

NT

CO

MP

LEX

ITY

OB

FUS

CA

TIO

N

CO

MP

UTA

TIO

N T

IME

MolecularStructures

Descriptors Model Activity

Page 20: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

OH3C

NN

CH3

N

CH3

Page 21: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Molecular Surface Properties

• Electronic Properties– Electrostatic Potential

– Electronic Kinetic Energy Density

– Electron Density Gradients ∇ρ•N

– Laplacian of the Electron Density

– Local Average Ionization Potential

– Bare Nuclear Potential (BNP)

– Fukui function F+(r) = ρHOMO(r)

K ( r ) = −(ψ * ∇ 2ψ + ψ∇ 2ψ *)

G (r ) = −∇ ψ * .∇ ψ

EP ( r ) =Z α

r − Rαα∑ −

ρ (r' )dr 'r − r'∫

L(r) = −∇ 2ρ(r) = K (r) − G(r)

PIP ( r ) =ρ i ( r ) ε i

ρ ( r )i∑

Page 22: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Why use Electron Density-Derived Molecular Descriptors?

• Motivations– Electron Density Distributions represent molecular properties

that are key to biological activities

• Enabling Technologies– Fast methods (TAE/RECON) for obtaining electron density-

derived properties

• Encoding schemes– Surface Property distributions (Histograms, Wavelets, Dixels)– Shape/Property hybrid distributions (PEST)

• Synergies– Complementary to topological descriptors

Page 23: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

tivirapine electrostatic potential (EP) surface

local average ionization potential (PIP) surface

bare nuclear potential (BNP) surface

NH

N

N

Cl

S

BNP distribution

PIP distribution

EP distribution

-0.114 0.0 0.114

14.241 30.016

0.300 0.819

Page 24: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Surface Property Distribution Histograms (RECON/TAE) Descriptors

Molecular surface property distributions can be represented as RECON/TAE histogram bin descriptors

Page 25: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

original signal

a0

approximation

a4

a3

a2

a1

a5

a6

a7

d4

detail

d5

d6

d7

d3

d2

d1

level 0

level 1

level 2

level 3

level 4

level 5

level 6

level 7

Page 26: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

wavelet coefficients

smoothed signaloriginal signal

DWT iDWT

d4

d5

d6

d7

d8

d9

a9

d3

d2

d1

remove high-

frequency component

wavelet coefficients

retain low- frequency

component wavelet

coefficients

1

2

4

3Wavelet Decomposition:– Creates a set of

coefficients that represent a waveform.

– Small coefficients may be omitted to compress data.

Wavelet Reconstruction:

16 coefficients of the WCD vector represent surface property densities with >95% accuracy.

Page 27: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

original signal

DWT

iDWT

wavelet coefficient descriptors (WCDs)

TAE histogram descriptors

reconstructed signal

a7

d4

d5

d6

d7

d3

d2

d1

Page 28: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

PEST: Molecular Shape/Property Hybrid Encoding

• PEST (Property-Encoded Surface Translation)– Adds shape information to encode the spatial relationships of surface properties

Page 29: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

PEST Molecular Ray Tracing Algorithm

QuickTime™ and a decompressor

are needed to see this picture.

Page 30: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Understanding and Interpretation

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

wavelet coefficient descriptors (WCDs)

a7

d7

Page 31: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually
Page 32: Cheminformatics and Chemical Informationreccr.chem.rpi.edu/Presentations/biotech_seminar_matt...2005/10/13  · Cheminformatics is about collecting, storing, and analyzing [usually

Conclusions: These things are important!

1. Descriptors classes

2. Selection of descriptors

3. Selection of descriptor representation