Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate...

53
Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute, NC Central University Adjunct Associate Professor Department of Medicinal Chemistry University of North Carolina at Chapel Hill UKY Seminar Weifan Zheng, Ph.D.

Transcript of Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate...

Page 1: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Cheminformatics in Drug Discovery and Chemical Genomics Research

Weifan Zheng, Ph.D.Associate Professor

Department of Pharmaceutical SciencesBRITE Institute, NC Central University

Adjunct Associate ProfessorDepartment of Medicinal Chemistry

University of North Carolina at Chapel Hill

UKY Seminar Weifan Zheng, Ph.D.

Page 2: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Page 3: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Drug Discovery & Development Pipeline

UKY Seminar Weifan Zheng, Ph.D.

Page 4: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Phases and Costs of Drug Discovery

UKY Seminar Weifan Zheng, Ph.D.

Page 5: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery • CADD: computer-assisted drug discovery• ADMET: Absorption, distribution, metabolism, elimination, toxicity

Drug Discovery Process and the Roles of CADD

GR DR DD Preclin

IND

I II III

T H L CH2L LOT2H

CADD

Clinical trials

UKY Seminar Weifan Zheng, Ph.D.

Page 6: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Human Genome Project Success

“Genome announcement 'technological triumph'Milestone in genetics ushers in new era of discovery, responsibility”

CNN, June 26, 2000

UKY Seminar Weifan Zheng, Ph.D.

Page 7: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemogenomics/Chemical Genomics

Chris AustinF. Collins

UKY Seminar Weifan Zheng, Ph.D.

Page 8: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Chemogenomics – 69,000 in google (Oct.16, 2006)

• Chemical genomics – 113,000 in google (Oct.16, 2006)

• Chemical biology – 4,210,000 (Oct.16, 2006)

• Chemical genetics– 104,000 (Oct.16, 2006)

Chemical Genomics

UKY Seminar Weifan Zheng, Ph.D.

Page 9: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.

Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.

Page 10: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do soon genomic scale).

Page 11: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemical SynthesisCenters

Chemical SynthesisCenters

MLIMLI

MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays

MLSCN (9+1)9 centers 1 NIH intramural20 x 10 = 200 assays

PubChem(NLM)

PubChem(NLM)

ECCR (6)ExploratoryCenters

ECCR (6)ExploratoryCenters

CombiChemParallel synthesis

DOS4 centers + DPI

100K – 1M compounds

CombiChemParallel synthesis

DOS4 centers + DPI

100K – 1M compounds

           

           

           

           

           

           

           

           

           

           

           

           

compounds

200 assays

SAR matrix

NIH Molecular Library Initiative

UKY Seminar Weifan Zheng, Ph.D.

N

O

O

O

R1

Page 12: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Biochemical assays• Cell-based functional assays• Phenotypic assays

• Databases– PubChem (http://pubchem.ncbi.nlm.nih.gov/)– ChemBank (http://chembank.broad.harvard.edu/)

– WOMBAT (http://sunsetmolecular.com/index.php)– Jubilant (http://www.jubilantbiosys.com/)– Gvk/Bio (http://www.gvkbio.com/)

Biological Assay Data

UKY Seminar Weifan Zheng, Ph.D.

Page 13: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,
Page 14: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

VirtualLibraries

Diverse Lib Design

Targeted Lib Design

CombinatorialSynthesis

HTS

KDD(QSAR, P.R.)

Rules

RealLibraries

SAR Data

Drug DiscoveryChemical Genomics

Logistics

Sci

entif

icHigh Throughput Chemistry and Screening: Informatics

UKY Seminar Weifan Zheng, Ph.D.

Page 15: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Page 16: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

3,0003 / 1,000 per week = ~0.5 million years!!!• Library Design: rational selection of a subset

of building blocks to obtain a maximum amount of information

(3000) R1

R2 (3000)

R3 (3000)

Challenges in Combinatorial Chemistry

UKY Seminar Weifan Zheng, Ph.D.

Page 17: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Design for Activity: Similarity

• If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select– A set of compounds that are similar to the

active compound

• The similarity principle: similar compounds should have similar biological activity

UKY Seminar Weifan Zheng, Ph.D.

Page 18: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

X1 X2 X3 • • • X20

Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6

• • • • • • • •• • • • • • • •• • • • • • • •

Str.100 0 3 5 • • • 1

123

X1

X2

Molecular Identity and Molecular Similarity

UKY Seminar Weifan Zheng, Ph.D.

Page 19: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Design for General Application: Diversity

UKY Seminar Weifan Zheng, Ph.D.

Page 20: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

- Maxi Min- Minimize (Sum 1/Dij*Dij)

Similarity and Diversity

UKY Seminar Weifan Zheng, Ph.D.

Page 21: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

0

2

4

6

8

10

12

Nu

mb

er

of

Clu

ste

r H

its

5s 5r 10s 10r 15s 15r 20s 20r 25s 25r 30s 30r

Number of Active Clusters

40

80

120

160

200

Cluster Hits Obtained by SAGE and Random Sampling

UKY Seminar Weifan Zheng, Ph.D.

Page 22: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Drug Discovery & Development Failures

Venkatesh & Lipper, J. Pharm. Sci. 89, 145-154 (2000)

poor PK

efficacy

Tox

Market

39%

29%

21%6%

UKY Seminar Weifan Zheng, Ph.D.

Page 23: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Multi-Factorial Design

00.10.20.30.40.50.60.70.80.9

1

score

UKY Seminar Weifan Zheng, Ph.D.

Page 24: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

)()( SEwSE ii

Total Score is the Weighted Sum of Individual Terms

UKY Seminar Weifan Zheng, Ph.D.

Page 25: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Penalty Scores

Iteration

Initial Library

Better Library

Optimal Library

Lipinski PropertiesP450 Activity

Diversity

R1 R2

R1

R2

R1

R2

R1

R2

Page 26: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Initial Ten solutions (undesigned)

The final ten solutions (well designed)

clogP

Designed Library Has a Better MW-clogP Distribution

Page 27: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

X1 X2 X3 • • • X20

Str. 1 2 5 1 • • • 4Str. 2 4 7 9 • • • 7Str. 3 1 6 8 • • • 6

• • • • • • • •• • • • • • • •• • • • • • • •

Str.100 0 3 5 • • • 1

123

X1

X2

Molecular Identity and Molecular Similarity

UKY Seminar Weifan Zheng, Ph.D.

Page 28: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Iterative Random Sampling

OriginalSpace

EmbeddingSpace (2D)

a b

D(a,b) D’(a,b)

If D’ > D, move a, b closerIf D’ < D, move a, b apart

SPE Algorithm (Agrafiotis)

UKY Seminar Weifan Zheng, Ph.D.

Page 29: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

Page 30: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

Page 31: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Chemical Space - Compound Collection Comparison

UKY Seminar Weifan Zheng, Ph.D.

Page 32: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

SPE Embedding of ChemSpace

UKY Seminar Weifan Zheng, Ph.D.

Page 33: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Page 34: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Quantitative Structure-Activity Relationship (QSAR)

Structures Activity

str1 a1

str2 a2

str3 a3

str4 a4

str5 a5

str6 a6

str7 a7

str8 a8

str9 a9

str10 a10

..

.

.

...

.

.predict

actu

al

..

.

.

.

.

..

predict

actu

al

q2=0.8R2=0.75

Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN)

UKY Seminar Weifan Zheng, Ph.D.

Page 35: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Structurally similar compounds should have similar biological activities

• Biological similarities are often due to similarities of substructures (pharmacophore)

• Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors

Basic Assumptions of KNN-QSAR Method

UKY Seminar Weifan Zheng, Ph.D.

Page 36: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

00.10.20.30.40.50.60.70.80.9

q2

AChE(60) 5HT1A(14) DHFR(23) D1 ANT (29)

Dataset

CoMFA/q2-GRSGA-PLSkNN-QSAR

Comparison of CoMFA, GA-PLS, and KNN-QSAR

UKY Seminar Weifan Zheng, Ph.D.

Page 37: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

01020304050

60708090

100

0 20 40 60 80 100

%Screened

%A

ctiv

e R

etri

eved

%Random

%Retrieved

QSAR Based Virtual Screening for GPCR Ligand Design

UKY Seminar Weifan Zheng, Ph.D.

Page 38: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Topics to Be Covered

Biotech/Pharma Orphan Disease Chemical Genomics

Computational Needs

Compound Collection Docking Scoring Data Analytics

CECCR Cheminformatics Center

UKY Seminar Weifan Zheng, Ph.D.

Page 39: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Docking and Scoring

• Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK

• GOLD, FRED,

GLIDE, FLEXX, AutoDock, ICM

X-raystructure

Page 40: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface

2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm

Our Approach to Derive DT-SCORE

UKY Seminar Weifan Zheng, Ph.D.

Page 41: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Receptor-ligand Complexes

Descriptor Generation

Tessellation of receptor-ligand interface

Model Generation & Prediction

Binding constant

DT-SCORE

Perceptron Learningalgorithm

Flowchart to Derive DT-SCORE

UKY Seminar Weifan Zheng, Ph.D.

Page 42: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation

Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D)

Delaunay Tessellation in 2D

UKY Seminar Weifan Zheng, Ph.D.

Page 43: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Delaunay Tessellation of the Receptor-Ligand Interface

UKY Seminar Weifan Zheng, Ph.D.

Page 44: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

RR

R

L

R

R

An atom is sharedby several tetrahedra

A Detailed View of Active Site Tessellation

Page 45: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

RRRLRRLLRLLL

RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atomsRRRL: Formed by 3 receptor atoms and 1 ligand atom

Each of the above tetrahedron types is further discriminated by atom types on the vertices

3 Types of Tetrahedra at the Receptor-Ligand Interface

UKY Seminar Weifan Zheng, Ph.D.

Page 46: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

RRRLRRLLRLLL

NCNO ONOS …… CNOO NOCS …… COSC OSXN ……

5 3 …… 8 2 …… 4 0 ……

Geometrical Descriptors According to Tetrahedron Types

UKY Seminar Weifan Zheng, Ph.D.

Page 47: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

( R·L Interaction Pattern – Binding Affinity Relationship Table)

Receptor-Ligand Complexes

Binding Affinity

RLLL RRLL RRRL

NCNO ONOS … CNOO NOCS … COSC OSXN …

(R • L)1 y1 0 3 … 2 8 … 1 3 …

(R • L)2 y2 1 7 … 3 1 … 0 3 …

… … … … … … … … … … …

(R • L)m-1 ym-1 3 4 … 0 5 … 4 6 …

(R • L)m ym 2 0 … 2 2 … 1 0 …

“QSAR” Input Table

UKY Seminar Weifan Zheng, Ph.D.

Page 48: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Input Layer Output Layer

2

1

3

N

x1

x2

x3

xN

y

w1

w2

w3

wN

(.)nf

xi = input of neuronwi = weight associated with the input xi

fn(.) = Activation function of output neuron.

Single-Layer Perceptron Network

Page 49: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Entire dataset

Test setTraining set

Model development (q2) Prediction of thetest set (R2)

80%(214 complexes)

20%(50 complexes)

(264 complexes)

Training Vs. Test Set Selection and Validation

UKY Seminar Weifan Zheng, Ph.D.

Page 50: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• Average value from multiple (ca. 80) models

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

0 200 400 600 800 1000

Number of Iterations

q2(R2)

Training Set

Test Set

Model Stability

UKY Seminar Weifan Zheng, Ph.D.

Page 51: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Actual pKd

Pre

dic

ted

pK

d

214 complexes: q2 = 0.73

Actual vs. Predicted Binding Affinity for the Training Set

UKY Seminar Weifan Zheng, Ph.D.

Page 52: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10 12 14 16

Actual pKd

Pre

dic

ted

pK

d

50 complexes: R2 = 0.61

Actual vs. Predicted Binding Affinity for the Test Set

UKY Seminar Weifan Zheng, Ph.D.

Page 53: Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

• NCCU and UNC– Jerry Ebalunode, Ph.D., BRITE– Min Shen, Ph.D., Lexicon– Alex Tropsha, Ph.D., Chair of MedChem,

UNC-Chapel Hill

• Funding– NIH P20HG003898– NIH R21GM076059

Acknowledgements

UKY Seminar Weifan Zheng, Ph.D.

• GSK

– Sunny Hung (GSK)

– George Seibel (JNJ)

– Ken Kopple (retired)

– Jeff Wiseman (Locus)

• Lilly

– Minmin Wang

– Greg Durst

– Jim Wikel (retired)