CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877...
-
Upload
deirdre-carson -
Category
Documents
-
view
219 -
download
0
Transcript of CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877...
![Page 1: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/1.jpg)
CZ3253: Computer Aided Drug designCZ3253: Computer Aided Drug design
Drug Design Methods I: QSARDrug Design Methods I: QSAR
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg
Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore
![Page 2: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/2.jpg)
22
TerminologyTerminology• SAR (Structure-Activity Relationships)
– Circa 19th century?
• QSAR (Quantitative Structure Activity Relationships)– Specific to some biological/pharmaceutical function of
molecule (Absorption, Distribution/Digestion, Metabolism, Excretion)
– Brown and Frazer (1868-9)• ‘constitution’ related to biological response
– LogP
• QSPR (Quantitative Structure Property Relationships)– Relate structure to any physical-chemical property of
molecule
![Page 3: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/3.jpg)
33
Statistical ModelsStatistical Models
• Simple– Mean, median and variation– Regression
• Advanced– Validation methods– Principal components, co-variance– Multiple Regression
QSAR,QSPR
![Page 4: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/4.jpg)
44
Modern QSARModern QSAR
– Hansch et. Al. (1963)• Activity ‘travel through body’ partitioning
between varied solvents
– C (minimum dosage required)– (hydrophobicity)– (electronic)– Es (steric)
1/C a b 2 c dE s const.
![Page 5: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/5.jpg)
55
Choosing DescriptorsChoosing Descriptors
• Buffon’s Problem
– Needle Length?– Needle Color?– Needle Composition?– Needle Sheen?– Needle Orientation?
![Page 6: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/6.jpg)
66
Choosing DescriptorsChoosing Descriptors• Constitutional
– MW, Natoms of element
• Topological– Connectivity,Weiner index (sums of bond distances)– 2D Fingerprints (bit-strings)– 3D topographical indices, pharmacophore keys
• Electrostatic – Polarity, polarizability, partial charges
• Geometrical Descriptors– Length, width, Molecular volume
![Page 7: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/7.jpg)
77
Choosing DescriptorsChoosing Descriptors• Chemical
– Hydrophobicity (LogP)– HOMO and LUMO energies– Vibrational frequencies– Bond orders– Energy total– GSH
![Page 8: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/8.jpg)
88
Statistical MethodsStatistical Methods
• 1-D analysis• Large dimension sets require decomposition
techniques– Multiple Regression– PCA– PLS
• Connecting a descriptor with a structural element so as to interpolate and extrapolate data
![Page 9: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/9.jpg)
99
Simple Error Analysis(1-D)Simple Error Analysis(1-D)
• Given N data points
– Mean
– Variance
– Regression
ycalc
yobs
xcalc
xobs
)()(
),(
YStdXStd
YXCovR
![Page 10: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/10.jpg)
1010
Simple Error Analysis(1-D)Simple Error Analysis(1-D)
• Given N data points– Regression
residualy
yyy obsi
calci
obscalc
obscalc
xx
yy
![Page 11: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/11.jpg)
1111
Simple Error Analysis(1-D)Simple Error Analysis(1-D)
• Given N data points– (Poor 0<R2<1(Good)
2
)()(
),(
)(
N
icalc yySSR
YStdXStd
YXCov
YStd
SSRR
nsfluctuatiobetween n Correlatio
1),(
1
YYXXN
YXCov i
N
ii
2
1
1)(
N
ii YY
NYStd
![Page 12: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/12.jpg)
1212
Correlation vs. Dependence?Correlation vs. Dependence?
• Correlation– Two or more variables/descriptors may correlate to
the same property of a system
• Dependence– When the correlation can be shown to be due to one
changing caused by the change of the other
• Example: Elephants head and legs– Correlation exists between size of head and legs– The size of one does not depend on the size of the other
![Page 13: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/13.jpg)
1313
Quantitative Structure Quantitative Structure Activity/Property Relationships Activity/Property Relationships
(QSAR,QSPR)(QSAR,QSPR)
• Discern relationships between multiple variables (descriptors)
• Identify connections between structural traits (type of subunits, bond angles local components) and descriptor values (e.g. activity, LogP, % denatured)
![Page 14: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/14.jpg)
1414
Pre-QualificationsPre-Qualifications
• Size– Minimum of FIVE samples per descriptor
• Verification– Variance– Scaling– Correlations
![Page 15: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/15.jpg)
1515
QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications
• Variance– Coefficient of Variation
Standard Deviation
Mean
x
x
"Spread"
![Page 16: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/16.jpg)
1616
QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications
• Scaling – Standardizing or normalizing descriptors to
ensure they have equal weight (in terms of magnitude) in subsequent analysis
![Page 17: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/17.jpg)
1717
QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications
• Scaling – Unit Variance (Auto Scaling)– Ensures equal statistical weights (initially)
– Mean Centering
x i' x i
' 1
x i' x i x
x ' 0
![Page 18: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/18.jpg)
1818
QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications
• Correlations
– Remove correlated descriptors
– Keep correlated descriptors so as to reduce data set size
– Apply math operation to remove correlation (PCR)
n)correlatio positive (100% 1
n)correlatio negative (100% 1
:
11
ij
ij
r
ENTATIONOVERREPRES
r
2
,
2
,
,,
,
thth,
descriptor j and ibetween n Correlatio)()(
),(
M
kjkj
M
kiki
jkj
M
kiki
ji
ji
jiji
XXXX
XXXXR
YStdXStd
XXCovR
![Page 19: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/19.jpg)
1919
QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications
• Correlations
![Page 20: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/20.jpg)
2020
QSAR/QSPR SchemeQSAR/QSPR Scheme
• Goal– Predict what happens next (extrapolate)!– Predict what happens between data points
(interpolate)!
![Page 21: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/21.jpg)
2121
QSAR/QSPR SchemeQSAR/QSPR Scheme
• Types of Variable– Continuous
• Concentration, occupied volume, partition coefficient, hydrophobicity
– Discrete• Structural (1: Methyl group substituted, 0: no
methyl group substitution)
![Page 22: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/22.jpg)
2222
QSAR/QSPRQSAR/QSPRPrincipal Components AnalysisPrincipal Components Analysis
• Reduces dimensionality of descriptors
• Principle components are a set of vectors representing the variance in the original data
![Page 23: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/23.jpg)
2323
Principal components – Principal components – reducing the dimensionality of a datasetreducing the dimensionality of a dataset
x
y
Clearly there is a relationship between x and y- a high correlation.We can define a new variable z = x+y suchthat we can express most of the variation inthe data as the new variable z.This new variable is a principal component.
v
j
jjii xcp1
,pi is the ith principalcomponent and ci,j is the coefficient of the variable xj.There are v such variables.
![Page 24: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/24.jpg)
2424
QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis
• Geometric Analogy (3-D to 2-D PCA)
y
z
x
x1 x2 ....xNy1 y2 ....yNz1 z2 ....zN
O
~
![Page 25: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/25.jpg)
2525
PCA is the transformation of a set of correlated variablesto a set of orthogonal uncorrelated variables called principalcomponents. These new variables are a linear combination of theoriginal variables in decreasing order of importance.
ikpkipiik tbYYr
p
.1
data matrix loadings (measure of the variation betweenvariables)
scores (measure of the variation between samples)
eigenvalue
Principal componentsPrincipal components
![Page 26: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/26.jpg)
2626
QSAR/QSPRQSAR/QSPRPrincipal Components AnalysisPrincipal Components Analysis
• Formulate matrix
• Diagonalize matrix
• Eigenvectors are the principal components – These principal components (new descriptors) are a linear
combination of the original descriptors
• Eigenvalues represent variance– Largest accounts for greatest % of data variance– Next corresponds to second greatest and so on
![Page 27: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/27.jpg)
2727
QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis
• Formulate matrix (Several types)
– Correlation or covariance (N x P)• N is number of molecules• P is number of descriptors
– Variance-Covariance matrix (N x N)
• Diagonalize (Rotate) matrix
r11 r12 ....r1pr21 r22 ....r2p rn1 rn2 ....rnp
A~
AA
T Avc
![Page 28: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/28.jpg)
2828
QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis
• Eigenvectors (Loadings) – Represents contribution from each original descriptor
to PC (new descriptor)• # columns = # of descriptors• # rows = # of descriptors OR # of molecules
• Eigenvalues– Indicate which PC most important (representative of
original descriptors)• Benzene has 2 non-zero and 1 zero eigenvalue (planar)
![Page 29: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/29.jpg)
2929
QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis
• Scores
– Graphing each object/molecule in space of 2 or more PCs
• # rows = # of objects/molecules• # columns = # of descriptors OR # of molecules
For benzene corresponds to graph in PC1 (x’) and PC2 (y’) system
![Page 30: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/30.jpg)
3030
PC1
PC2
x
y
The PC’s each maximise the variancein the data in orthogonal directions andare ordered by size.
Usually only a few components are neededto explain (>90%) of the variance in thedata – or the properties are not relevant
The first step is to calculate the varience-covarience matrix from the data
Principal componentsPrincipal components
![Page 31: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/31.jpg)
3131
PC1
PC2
x
y
If there are s observations each of which contains v values, the data can be represented by a matrix D with v rows and s columns.
The varience-covariance matrix is Z = DTD.
The eigenvectors of Z are the principal components. Z is a square symmetric matrix so the eigenvectors are orthogonal. Usually the matrix is diagonalised to obtain the eigenvectors (the weightings for the properties) and eigenvalues (the explained variance).
Principal componentsPrincipal components
![Page 32: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/32.jpg)
3232
80 10 5 3 2
p1 .2 .3 .4 .1 .1 p2 .01 .02 .3 .4 .5p3 .02 .03 .1 .2 .4p5 .03 .4 .4 .04 .3p5 .3 .5 .5 .05 .3
eigenvalues – explain % variance
Properties
Multiply the property valuefor molecule by this for eacheigenvalue
Can do regression on the PC’s, egV = 0.3PC1(0.1) + 0.2PC2(0.1) + 0.4(0.2)
so, we’ve reduced a 5 property problem to a two property problem
The output looks like this :
Principal componentsPrincipal components
![Page 33: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/33.jpg)
3333
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
![Page 34: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/34.jpg)
3434
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
10D3D
![Page 35: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/35.jpg)
3535
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
• Eigenvalues Explanation of variance in data
![Page 36: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/36.jpg)
3636
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
• Each point corresponds to column (# points = # descriptors) in original data
Proximity correlation
![Page 37: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/37.jpg)
3737
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)• Each point corresponds to row of original data
(i.e. #points = #molecules) or graph of molecules in PC space
HeNapthalene
H2O
Molecular Size
Small acting Big
Proximitysimilarity
![Page 38: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/38.jpg)
3838
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
Outlier
![Page 39: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/39.jpg)
3939
QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)
![Page 40: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/40.jpg)
4040
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Principal Component Analysis
![Page 41: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/41.jpg)
4141
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Principal Component Analysis
![Page 42: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/42.jpg)
4242
Non-Linear MappingsNon-Linear Mappings
• Calculate “distance” between points in N-dimensional descriptor/parameter space– Euclidean– City-block distances
• Randomly assign compounds in set to points on a 2-D or 3-D space
• Minimize Difference (Optimal N-d 2D plot)
![Page 43: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/43.jpg)
4343
Non-Linear MappingsNon-Linear Mappings
• Advantages– Non-linear– No assumptions!– Chance groupings unlikely (2D group likely an
N-D group)
• Disadvantages– Dependence on initial guess (Use PCA scores
to improve)
![Page 44: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/44.jpg)
4444
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Multiple Regression (MR)• PCR• PLS
![Page 45: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/45.jpg)
4545
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Linear Regression– Minimize difference between calculated and
observed values (residuals)
Multiple Regression
y mx b
mx i x y i y
i1
N
x i x 2
i1
N
b y m x
y mi * x ii1
N
B
![Page 46: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/46.jpg)
4646
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Principal Component Regression
– Regression but with Principal Components substituted for original descriptors/variables
![Page 47: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/47.jpg)
4747
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Partial Least Squares
– Cross-validation determines number of descriptors/components to use
– Derive equation – Use bootstrapping and t-test to test
coefficients in QSAR regression
![Page 48: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/48.jpg)
4848
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• Partial Least Squares (a.k.a. Projection to Latent Structures)– Regression of a Regression
• Provides insight into variation in x’s(bi,j’s as in PCA) AND y’s (ai’s)
– The ti’s are orthogonal – M= (# of variables/descriptors OR
#observations/molecules whichever smaller)
y ai * tii
N
ti bij * x jj
M
![Page 49: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/49.jpg)
4949
QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types
• PLS is NOT MR or PCR in practice
– PLS is MR w/cross-validation– PLS Faster
• couples the target representation (QSAR generation) and component generation while PCA and PCR are separate
• PLS well applied to multi-variants problems
![Page 50: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/50.jpg)
5050
QSAR/QSPRQSAR/QSPRPost-QualificationsPost-Qualifications
• Confidence in Regression– TSS-Total Sum of Squares– ESS-Explained Sum of Squares– RSS-Residual Sum of Squares
TSSESS RSS
R2 ESS
TSS
1 (100% explaination of data)
0 (no explaination of data)
y i y 2
i
N
TSS
ycalc,i y 2ESS
i
N
y i ycalc,i 2
i
N
RSS
![Page 51: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/51.jpg)
5151
QSAR/QSPRQSAR/QSPRPost-QualificationsPost-Qualifications
• Confidence in Prediction (Predictive Error Sum of Squares)
Q2 1PRESS
y i y 2
i1
N
, PRESS y i ycalc,i 2
i1
N
![Page 52: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/52.jpg)
5252
QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification
• Bias?– Bootstrapping
• Choosing best model?– Cross Validation
![Page 53: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/53.jpg)
5353
QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification
• Bootstrapping
– ASSUME calculated data is experimental/observed data
– Randomly choose N data (allowing for a multiple picks of same data)
– Re-generate parameters/regression – Repeat M times– Average over M bootstraps– Compare (calculate residual)
• If close to zero then no bias• If large then bias exists
M is typically 50-100
![Page 54: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/54.jpg)
5454
QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification
• Cross-Validation (used in PLS)– Remove one or more pieces of input data– Re-derive QSAR equation– Calculate omitted data– Compute root-mean-square error to evaluate efficacy of model
• Typically 20% of data is removed for each iteration• The model with the lowest RMS error has the optimal number of
components/descriptors
![Page 55: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/55.jpg)
5555
QSPR ExampleQSPR Example
• Relation between musk odorant properties and benzenoid structure– Training set of 148 compounds (81 non-musk and 67 musk)– 47 chemical descriptors initially– Pre-qualifications
• Correlations (47-12=35)
– Post-qualifications• Bootstrapping • Test-set
– 6/6 musks, 8/9 non-musks
Narvaez, J. N., Lavine, B. K. and Jurs, P. C. Chemical Senses, 11, 145-156 (1986)
![Page 56: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/56.jpg)
5656
Practical IssuesPractical Issues
• 10 times as many compounds as parameters fit
• 3-5 compounds per descriptor
• Traditional QSAR – Good for activity prediction– Not good for whether activity is due to binding
or transport
![Page 57: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/57.jpg)
5757
Advanced MethodsAdvanced Methods
• Neural Networks• Support Vector Machines• Genetic/Evolutionary Algorithms• Monte Carlo• Alternate descriptors
– Reduced graphs– Molecular connectivity indices– Indicator variables (0 or 1)
• Combinatorics (e.g. multiple substituent sites)
![Page 58: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/58.jpg)
5858
Tools AvailableTools Available
• Sybyl (Tripos Inc.)
• Insight II (Accelrys Inc.)
• Pole Bio-Informatique Lyonnais – http://pbil.univ-lyon1.fr/
• Molecular Biology– http://www.infobiogen.fr/services/deambulum/
english/logiciels.html
![Page 59: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg Room.](https://reader035.fdocuments.in/reader035/viewer/2022062322/5697bfe91a28abf838cb6af3/html5/thumbnails/59.jpg)
5959
SummarySummary
• QSAR/QSPR– Statistics connect structure/behavior w/ observables– Interpolate/Extrapolate
• Multi-Variate Analysis– Pre-Qualification– Regression
• PCA• PLS• MLS
– Post-Qualification