Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity...

34
Ahmad Reza Mehdipour 07.11.2017 Quantitative Structure-Activity Relationship (QSAR) 07.11.2017 http://www.biophys.mpg.de/en/theoretical-biophysics/ computational-drug-design.html

Transcript of Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity...

Page 2: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Course Outline

1.Ligand-­‐based  approaches  1.(Quantitative)  structure-­‐activity  relationship  (SAR  &  QSAR)2.Pharmacophore  modeling

2.Bioinformatics  approaches  (target  recognition  and  structural  modeling)  1.Sequence  alignments  and  searches2.Gene  identiBication  and  prediction3.Homology  modeling

3.Structure-­‐based  approaches  1.Molecular  docking

1.Ligand  docking:  theory  and  scoring  functions2.Virtual  screening3.Protein-­‐protein  docking  and  interaction

2.Molecular  dynamics  simulation1.Introduction  into  molecular  dynamics  

3.Estimation  of  ligand  binding  afBinity1.Free  energy  perturbation2.Enhance  sampling  methods

1.

Page 3: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Ligand-based approach

• Structure-Activity Relationships (SAR)

• Quantitative Structure-Activity Relationships (QSAR)

Molecular descriptors

( )= fBiological activity

Page 4: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

QSAR: Historical perspective

1900. Meyer-Overton

Public Domain, https://commons.wikimedia.org/w/index.php?curid=6597630

Page 5: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

QSAR: Historical perspective

1964. Hansch analysis

Hansch & Fujita, JACS 1964

log 1! = −!!! + !!!!! − !!!!!! + log !! + !!!!!

Page 6: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Quantitative Structure-Activity Relationships (QSAR)

Definition

QSAR is building a mathematical model correlating a set

of structural descriptors of a set of chemical compounds

to their biological activity.

QYXR is building a mathematical model correlating a set of

independent variables of a set of samples to a set of dependent

variables.

Page 7: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Quantitative Structure-Activity Relationships (QSAR)

1. Set of compounds

4. Biological activities

Page 8: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Considerations

All compounds should belong to congeneric series

Same mechanism of action

A similar binding mechanism

Biological activity should be exactly the same

Biological activity is correlated to binding affinity

Page 9: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Quantitative Structure-Activity Relationships (QSAR)

1. Set of compounds

2. Molecular descriptors

4. Biological activities

Page 10: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Quantitative Structure-Activity Relationships (QSAR)

1. Set of compounds

2. Molecular descriptors

3. Mathematical models

4. Biological activities

! = !! + !!!! + !!!! +⋯+ !!!! !Mul$ple  Linear  Regression  (MLR)

Par$al  Least  Square  (PLS)

Ar$ficial  Neural  Network  (ANN)

Gene$c  Algorithm  (GA)

Page 11: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Molecular descriptors

Molecular descriptors

Page 12: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Molecular descriptors

1D descriptors

2D descriptors

3D descriptors

Molecular weight, LogP, No. of functional groups

Topological indices

Geometrical parameters, Molecular surfaces, Quantum

chemistry descriptors

Page 13: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

2D descriptors

Topological indices based on adjacency matrix

1

3 4

6

5

21 3 4 652

1

3

4

6

5

20 22 01 12 23 33 3

!!!!!

1 21 20 11 02 12 1

!!!!!

3 33 32 21 10 22 0

!! !

!! = 12 !!"

!

!!!

!

!!!!TI = 29

Page 14: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

3D descriptors

Quantum chemical descriptors

Descriptors calculated by Quantum Mechanic methods

(semi empirical, Ab initio or DFT )

Partial atomic charges

Lowest occupied molecular orbital energy (LUMO)

Highest occupied molecular orbital energy (HOMO)

Electrostatic potential

Molecular polarizability

Page 15: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Molecular descriptors Softwares

Dragon

GAUSSIAN

HyperChem

CODESSA

MOE

Page 16: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Quantitative Structure-Activity Relationships (QSAR)

1. Set of compounds

2. Molecular descriptors

3. Mathematical models

4. Biological activities

! = !! + !!!! + !!!! +⋯+ !!!! !Mul$ple  Linear  Regression  (MLR)

Par$al  Least  Square  (PLS)

Ar$ficial  Neural  Network  (ANN)

Gene$c  Algorithm  (GA)

Page 17: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Multiple Linear Regression (MLR)

InterceptCoefficients

! = (!!!)!!!′!!

! = !! = !(!!!)!!!′!!

! = !! + !!!! + !!!! +⋯+ !!!! !

!! − ! − !!!!,! −⋯ !!!!,! !

!!Objective Function

Page 18: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Multiple Linear Regression (MLR)

! = !!!!

/(! − ! − 1)!

! = !! + !!!! + !!!! +⋯+ !!!! !

! =!! − ! !!

!!! !!! − !! !!

!!! ! − ! − 1!!! = 1− !! − !! !!

!!!!! − ! !!

!!!!

ȓ = -

!!!!…!!

!!!!!!!!!!!1!2…!"

!

Expr Estimated

!"# = ! log !!!!! !

! + 2(! + 1)!Akaike Information Criterion

Page 19: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Multiple Linear Regression (MLR)

X1 X2 X3 X4 Yexp Ycalc Residual

1 3.42 38.51 6.62 6.63 3 2.9 0.12 3.05 38.91 6.61 6.04 3.15 3.37 -­‐0.223 2.52 54.28 6.58 6.23 3.28 3.07 0.214 3.29 54.27 6.63 6.09 4.24 3.91 0.335 2.25 54.62 6.61 6.03 3.28 3.14 0.146 2.42 55.37 6.59 5.67 4.35 3.75 0.67 3.15 70.6 6.67 6.51 3.88 3.69 0.198 1.67 69.77 6.49 5.79 3.64 3.3 0.349 2.91 70.03 6.64 6.11 4.35 3.99 0.3610 1.73 70.57 6.61 6.04 3.4 3.11 0.2911 1.36 86.18 6.64 6.12 3.3 3.12 0.1812 2.81 85.83 6.62 6.05 4.7 4.38 0.3213 2.96 102.96 6.66 6.52 4.67 4.35 0.3214 0.65 102.7 6.61 6.04 3.34 3.06 0.2815 2.22 117.89 6.62 6.04 4.11 4.74 -­‐0.6316 0.19 118.98 6.61 6.18 3.37 2.92 0.4517 2.85 135.34 6.67 6.52 5.93 5.1 0.8318 0.39 134.08 6.65 6.32 3.65 3.31 0.3419 3.58 22.34 6.7 6.6 2.7 2.69 0.0120 3.41 54.34 6.62 6.64 3.49 3.29 0.221 0.43 77.39 1.87 4.37 1.99 1.87 0.1222 0.35 93.05 1.88 4.34 2.38 2.25 0.1323 0.09 109.53 1.87 4.34 2.76 2.46 0.324 -­‐0.2 125.8 1.88 4.34 3.29 2.65 0.6425 1.41 87.61 0.35 -­‐14.65 0.87 0.85 0.02

∂2=0.170 R2=0.899 F=42.4

! = 4.224− 1.305!! + 0.535!! + 0.026!! + 0.817!!!

∂2Y=0.712

Page 20: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Variable selection

1. Systematic approaches

1. Forward selection

2. Backward elimination

2. Heuristic approaches

1. Genetic algorithm

2. Simulated annealing

Page 21: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Forward selection

Y X1 X2 X3 X4 X5X1 X2 X3 X4 X5

AIC 57.7 60.70 54.7 56.1 56.5Y=a+Xn

X1 X2 X3 X4 X5

AIC 56.3 47.55 56.7 56.5Y=a+X3+Xn

X1 X2 X3 X4 X5

AIC 29.4 49.5 48.3Y=a+X3+X2+Xn

X1 X2 X3 X4 X5

AIC 13.8 25.1Y=a+X3+X2+X1+Xn

X1 X2 X3 X4 X5

AIC 15.7Y=a+X3+X2+X1+X4+Xn

!"# = ! log !!!!! !

! + 2(! + 1)!

Page 22: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Backward elimination

Y X1 X2 X3 X4 X5 !"# = ! log !!!!! !

! + 2(! + 1)!

X1 X2 X3 X4 X5

AIC 15.7Y=a+X1+X2+X3+X4+X5

X1 X2 X3 X4 X5

AIC 21.8 50.6 59.9 25.1 13.8Y=a+X1+X2+X3+X4

X1 X2 X3 X4

AIC 31.9 49.5 58.0 29.4Y=a+X1+X2+X3+X4

Page 23: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Genetic algorithm

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 GENOME

0 1 0 0 1 0 0 1 0 0

0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0

1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 1

AIC

! = !! + !!!! + !!!! + !!!!!

0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0

Mutation Mutation

Page 24: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Partial least square

The X-variables are correlated

The number of X-variables is relatively high compared with the number of samples

X = TPT Y =UQT

Y =ß X + ℇ

U =ß T + ℇ

Page 25: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Other modeling methods

Non-linear regression

Artificial neural network

Classification methods

Multiple logistic regression

Support vector machine

! = !! + !!!!! + !!!!! +⋯+ !!!!! ! Y

X1

X2

X3

! !!

!

!!!!! !

Page 26: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Validation

Valida&on  is  required  to  ensure  model  quality  

Over-­‐fi6ng  

Chance  correla&on

1. Cross-validation

1. Leave-one-out

2. Leave-N-out

2. Bootstrapping

3. External validation (prediction set)

4. Y randomization

Page 27: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Cross-validation

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19Y20

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19

Y20

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y20

Y19

P Tim

es

Leave-one-out

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19Y20

Leave-N-out

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16

Y20

Y19

Y18

Y17

Y1

Y2

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y20

Y3

Y4

Y5

Y6P/

N T

imes

Rcv2LOO Rcv2LNO

Page 28: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Bootstrapping

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19Y20

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y15Y17Y19

Y20

Y16

Y18

Y14

Y2

Y3

Y4

Y5

Y8

Y9

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y20

Y7

Y10

Y1

Y6

N T

imes

RBS2

Page 29: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

External validation

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y19

Y20

Y1

Y3

Y4

Y5

Y6

Y7

Y8

Y10

Y11

Y12

Y13

Y15

Y16

Y17

Y19

Y2

Y9

Y14

Y18

Y20

Variable selection

Cross-validation

Final model

Predic

t

R2EV

Page 30: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Y-randomization

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y19

Y20

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

Y =ß X + ℇ

Y20

Y19

Y18

Y17

Y16

Y15

Y14

Y13

Y12

Y11

Y10

Y9

Y8

Y7

Y6

Y5

Y4

Y3

Y2

Y1

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

Ynew =ß X + ℇ RYrand2

N T

imes

Page 31: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Good model?

! = !! + !!!! + !!!! +⋯+ !!!! !∂2 R2 F (R)MSEModel Robustness

!"#$ = !"#$ !! − ! !

! − 1

!

!!!!

Model Quality Rcv2LOO Rcv2LNO RBS2 RMSEcv

Model Reliability RYrand2 RMSEYrand

Model Predictability REV2 RMSEEv

Page 32: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Good model?

! = !! + !!!! + !!!! +⋯+ !!!! !∂2 R2 >0.8 F (R)MSEModel Robustness

Model Quality Rcv2LOO >0.6 Rcv2LNO >0.6 RBS2 >0.6 RMSEcv

R2 - Rcv2 < 0.3

Model Reliability RYrand2 <0.3 RMSEYrand

R2 - RYrand2 > 0.4

Model Predictability REV2 >0.6 RMSEEV

R2 - REV2 < 0.3

Page 33: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Applicability domain

! = !! + !!!! + !!!! +⋯+ !!!! !

X1

X2

Principal component analysis

Page 34: Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Prediction Vs Description

VE_b(e): coefficient sum of the last eigenvector from Burden matrix weighted by Sanderson electronegativityATS1v: Broto-Moreau autocorrelation of lag 1 (log function) weighted by van der Waals volumeSM02_AEA: spectral moment of order 2 from augmented edge adjacency mat. weighted by resonance integral

! = 2.34+ 3.5!!! ! − 0.87!"!1! + 3.76!!02_!"!!

! = 8.34+ 2.5!"#$ + 0.93!"#!

∂2=0.003 R2=0.951 F=260.2 REV2=0.891

∂2=0.113 R2=0.811 F=43.2 REV2=0.761

LogP: water-oil partition coefficientNAR: Number of aromatic rings