COSMOsim3D and COSMOsar3D - COSMOlogic · COSMOsim3D and COSMOsar3D Alignment and 3D-QSAR based on...

1
COSMOsim3D and COSMOsar3D Alignment and 3D-QSAR based on COSMO surfaces Karin Wichmann 1 , Michael Thormann 2 , Andreas Klamt 1 1 COSMOlogic GmbH & Co. KG, Imbacher Weg 46, 51379 Leverkusen, Germany 2 origenis GmbH, Am Klopferspitz 19a, 82152 Martinsried, Germany COSMOlogic GmbH & Co. KG Imbacher Weg 46 phone +49 - 2171 - 731680 Prof. Dr. Andreas Klamt www.cosmologic.de 51379 Leverkusen, Germany fax +49 - 2171 - 731689 [email protected] n Test Case: Virtual Screening Active ligands for 22 diverse biological targets, each with 2-3 ligands as alignment templates 4-30 known active ligands Screening compounds: “Random“: Rognan Screening Set selected from MDL Screening Compounds Directory (850 organic compounds) Drugs: active ligands of all 22 biological targets (253 known diverse drugs) Data end structures from: Ligand-based models of 22 diverse targets using a background of screening molecules, A. E. Cleves, A. N. Jain, J. Med. Chem. 2006, 49, 2921-2938. 0.7 0.8 0.8 0.9 0.9 1.0 1.0 Lanosterol Demethylase D-Ala-D-Ala Carboxypeptidase Dihydropteroate Synthase DNA Gyrase HIV Reverse Transcriptase L-Type Calcium Channel Acetylcholinesterase Angiotensin I Conv. Enz. beta-Adrenergic Receptor Opioid Receptor Mu Voltage-Gated Sodium Channel Estrogen Receptor Progesterone Receptor Androgen Receptor Gluco/Corticosteroid Receptor COX-I COX-II GABAA Barbiturate Site GABAA Benzodiazepine Site Muscarinic Acetylcholine Rec. Histamine Receptor NaCl Cotransporter Sulfonyl Urea Receptor ROC AUC Surflex-Sim, Rognan Screening Background COSMOsim3D, Rognan Screening Background Surflex-Sim, Drug Screening Background COSMOsim3D, Drug Screening Background The Receiver Operated Characteristic (ROC) Area Under the Curve (AUC) is a threshold independent score for the fractions of true positives and false negatives in compound screening. A ROC AUC of 1 represents a 100% true positive rate at a false positive rate of 0%. A ROC AUC of 0.5 represents a true positive and a false positive rate which is equivalent to a random selection. The displayed histogram summarizes the ROC scores for 22 diverse targets. COSMOsim3D Similarity assessment based on the polarity of a solvent accessible surface. COSMOsar3D Generates fields based on local s- profiles for molecular field analysis. Superposition of eight ligands and their COSMO cavities as obtained from a COSMOsim3D alignment. Molecules and cavities rendered with PyMOL. Local σ-profiles (LSPs) result from projecting the molecular σ- surface onto a 3D-grid. provide information about o electrostatics o hydrogen bonding o hydrophobic interactions o shape can be generated from fragmented COSMO files for screening purposes. ACE, 114 compounds (76 training, 38 test) AChE, 111 compounds (74 training, 37 test) BZR, 163 compounds (98 training, 49 test, 16 inactive) COX2, 322 compounds (188 training, 94 test, 40 inactive) DHFR, 397 compounds (237 training, 124 test, 36 inactive) GPB, 66 compounds (44 training, 22 test) THERM, 76 compounds (51 training, 25 test) THR, 88 compounds (59 training, 29 test) Original Sutherland alignment COSMOsim3D alignment Test Case: Sutherland Data Sets Alignment and 3D-QSAR COSMOsim3D utilizes LSPs instead of chemical structure or pharmacophores to assess a molecule’s similarity to a template molecule. This enables scaffold hopping and allows users to deal with datasets of different chemotypes. Arrays of LSPs provide all information required for quantifying ligand-receptor interactions, including desolvation. In COSMOsar3D, this leads to increased predictive accuracy outstanding robustness with regard to grid step size, grid positioning and random misalignment. Original Sutherland alignment compared to the unsupervised COSMOsim3D alignment for the eight datasets of the Sutherland collection. Alignment was done with grid size of 1 Å and 200 random starts. Molecules were rendered with PyMOL. Individual and average statistics of 3D-QSAR models obtained from COSMOsar3D, O3QMFA and other CoMFA-like methods considered by Sutherland. M. Thormann, A. Klamt, K. Wichmann, COSMOsim3D: 3D-Similarity and Alignment Based on COSMO Polarization Charge Densities, J. Chem. Inf. Model., 2012, 52 (8), 2149–2156. A. Klamt, M. Thormann, K. Wichmann, P. Tosco, COSMOsar3D: Molecular Field Analysis Based on Local COSMO s-Profiles, J. Chem. Inf. Model., 2012, 52 (8), 2157–2164. Data, aligned structures, and statistics for 7 QSAR methods from: J.J. Sutherland, L.A. O’Brien, D.F.A.A. Weaver, Comparison of Methods for Modeling Quantitative Structure-Activity Relationships, J. Med. Chem. 2004, 47, 5541-5554. Dependence of the standard deviation of the training (dotted lines) and test set (plain lines) residuals for COSMOsar3D (black) and O3QMFA (gray) models upon systematic variation of the grid step size. Dependence of the standard deviation of the training (dotted lines) and test set (plain lines) residuals for COSMOsar3D (black) and O3QMFA (gray) models upon systematic translation of the grid center position (0.5-5.0 Å on the three Cartesian axes). Sensitivity of the internal (dotted lines) and external (plain lines) predictive quality of COSMOsar3D (black) and O3QMFA (gray) models with respect to translational misalignment of individual compounds of increasing amplitude. COSMOsar3D Robustness and Predictivity: Input of a histogram instead of using properties directly leads to very robust performance and increased predictive accuracy. Ionic and Neutral Molecules: s of neutral and charged species are in the same range, allowing for differently charged compounds in one model. Linear Relationship of logK i and LSPs: COSMOsar3D provides a rationale that logK i values should be linear functions of the LSP descriptors. Description of Hydrogen Bonding: s is better suited for the description of hydrogen bonding than the ESP usually employed in MFA. A. Klamt, J. Reinisch, F. Eckert, A. Hellweg, M. Diedenhofen, Polarization charge densities provide a predictive quantification of hydrogen bond energies, Phys. Chem. Chem. Phys. 2012, 14, 955-963; COSMOsim3D Pairwise Alignment and Similarity Assessment: LSPs instead of chemical structure or pharmacophores enable scaffold hopping and dealing with datasets of different chemotypes. Multi-template alignment: The superposition of multiple aligned template molecules can be used as a virtual template molecule. Ligand-Based Virtual Screening: Ranking potential ligand molecules according to their similarity to a single template or a virtual multi-template allows for enrichment of ligand sets with potential cognate drugs. Linear Relationship of logK i and LSPs Consider a protein receptor R and its aqueous embedding as a locally slightly flexible matrix with locally varying s-potential. The free energy of a ligand L in R, assuming that the receptor generates a position-dependent s-potential, is Similarly, the free energy of a solute X in a homogeneous solvent S is Interpolating on a grid the position-dependent s-potential of ligand L in receptor R gives: The same holds for m L in water, even with a position-independent s-potential m W (is). Linear relationship to pKi ! a n and s n are the surface area and polarization charge density of a surface segment n of solute X = + , 2 =c + , = + , = 2 + ln , = + =c + , = ,,, , , , LSP , , ,

Transcript of COSMOsim3D and COSMOsar3D - COSMOlogic · COSMOsim3D and COSMOsar3D Alignment and 3D-QSAR based on...

Page 1: COSMOsim3D and COSMOsar3D - COSMOlogic · COSMOsim3D and COSMOsar3D Alignment and 3D-QSAR based on COSMO surfaces Karin Wichmann1, Michael Thormann2, Andreas Klamt1 1 COSMOlogic GmbH

COSMOsim3D and COSMOsar3D

Alignment and 3D-QSAR based on COSMO surfacesKarin Wichmann1, Michael Thormann2, Andreas Klamt1

1 COSMOlogic GmbH & Co. KG, Imbacher Weg 46, 51379 Leverkusen, Germany2 origenis GmbH, Am Klopferspitz 19a, 82152 Martinsried, Germany

COSMOlogic GmbH & Co. KG Imbacher Weg 46 phone +49 - 2171 - 731680 Prof. Dr. Andreas Klamtwww.cosmologic.de 51379 Leverkusen, Germany fax +49 - 2171 - 731689 [email protected]

n

Test Case: Virtual Screening

Active ligands for 22 diverse biologicaltargets, each with• 2-3 ligands as alignment templates• 4-30 known active ligands

Screening compounds:• “Random“: Rognan Screening Set

selected from MDL Screening Compounds Directory (850 organic compounds)

• Drugs: active ligands of all 22 biological targets (253 known diverse drugs)

Data end structures from: Ligand-based models of 22 diverse targets using a background of screening molecules,A. E. Cleves, A. N. Jain, J. Med. Chem. 2006, 49, 2921-2938.

0.7 0.8 0.8 0.9 0.9 1.0 1.0

Lanosterol Demethylase

D-Ala-D-Ala Carboxypeptidase

Dihydropteroate Synthase

DNA Gyrase

HIV Reverse Transcriptase

L-Type Calcium Channel

Acetylcholinesterase

Angiotensin I Conv. Enz.

beta-Adrenergic Receptor

Opioid Receptor Mu

Voltage-Gated Sodium Channel

Estrogen Receptor

Progesterone Receptor

Androgen Receptor

Gluco/Corticosteroid Receptor

COX-I COX-II

GABAA Barbiturate Site

GABAA Benzodiazepine Site

Muscarinic Acetylcholine Rec.

Histamine Receptor

NaCl Cotransporter

Sulfonyl Urea Receptor

ROC AUC

Surflex-Sim, Rognan Screening Background COSMOsim3D, Rognan Screening Background

Surflex-Sim, Drug Screening Background COSMOsim3D, Drug Screening Background

The Receiver Operated Characteristic (ROC) Area Under the Curve (AUC) is a threshold independent score for the fractions of true positives and false negatives in compound screening. A ROC AUC of 1 represents a 100% true positive rate at a false positive rate of 0%. A ROC AUC of 0.5 represents a true positive and a false positive rate which is equivalent to a random selection. The displayed histogram summarizes the ROC scores for 22 diverse targets.

COSMOsim3D Similarity assessment based on the polarity of a solvent accessible surface.

COSMOsar3DGenerates fields based on local s-profiles for molecular field analysis.

Superposition of eight ligands and their COSMO cavities as obtained from a COSMOsim3D alignment. Molecules and cavities rendered with PyMOL.

Local σ-profiles (LSPs)• result from projecting the molecular σ-

surface onto a 3D-grid.• provide information about

o electrostaticso hydrogen bondingo hydrophobic interactionso shape

• can be generated from fragmented COSMO files for screening purposes.

ACE, 114 compounds(76 training, 38 test)

AChE, 111 compounds(74 training, 37 test)

BZR, 163 compounds(98 training, 49 test,16 inactive)

COX2, 322 compounds(188 training, 94 test,40 inactive)

DHFR, 397 compounds(237 training, 124 test,36 inactive)

GPB, 66 compounds(44 training, 22 test)

THERM, 76 compounds(51 training, 25 test)

THR, 88 compounds(59 training, 29 test)

OriginalSutherland alignment

COSMOsim3Dalignment

Test Case: Sutherland Data SetsAlignment and 3D-QSAR

COSMOsim3D utilizes LSPs instead of chemical structure or pharmacophores to assess a molecule’s similarity to a template molecule. This enables scaffold hopping and allows users to deal with datasets of different chemotypes.

Arrays of LSPs provide all information required for quantifying ligand-receptor interactions, including desolvation. In COSMOsar3D, this leads to• increased predictive accuracy • outstanding robustness with regard to grid step size, grid positioning and random

misalignment.

Original Sutherland alignment compared to the unsupervised COSMOsim3D alignment for the eight datasets of the Sutherland collection. Alignment was done with grid size of 1 Å and 200 random starts. Molecules were rendered with PyMOL.

Individual and average statistics of 3D-QSAR models obtained from COSMOsar3D, O3QMFA and other CoMFA-like methods considered by Sutherland.

M. Thormann, A. Klamt, K. Wichmann, COSMOsim3D: 3D-Similarity and Alignment Based on COSMO Polarization Charge Densities, J. Chem. Inf. Model., 2012, 52 (8), 2149–2156.A. Klamt, M. Thormann, K. Wichmann, P. Tosco, COSMOsar3D: Molecular Field Analysis Based on Local COSMO s-Profiles, J. Chem. Inf. Model., 2012, 52 (8), 2157–2164.Data, aligned structures, and statistics for 7 QSAR methods from: J.J. Sutherland, L.A. O’Brien, D.F.A.A. Weaver, Comparison of Methods for Modeling Quantitative Structure-Activity Relationships, J. Med. Chem. 2004, 47, 5541-5554.

Dependence of the standard deviation of the training (dotted lines) and test set (plain lines) residuals for COSMOsar3D (black) and O3QMFA (gray) models upon systematic variation of the grid step size.

Dependence of the standard deviation of the training (dotted lines) and test set (plain lines) residuals for COSMOsar3D (black) and O3QMFA (gray) models upon systematic translation of the grid center position (0.5-5.0 Å on the three Cartesian axes).

Sensitivity of the internal (dotted lines) and external (plain lines) predictive quality of COSMOsar3D (black) and O3QMFA (gray) models with respect to translational misalignment of individual compounds of increasing amplitude.

COSMOsar3D• Robustness and Predictivity: Input of a histogram instead of using properties

directly leads to very robust performance and increased predictive accuracy.• Ionic and Neutral Molecules: s of neutral and charged species are in the same

range, allowing for differently charged compounds in one model.• Linear Relationship of logKi and LSPs: COSMOsar3D provides a rationale that

logKi values should be linear functions of the LSP descriptors. • Description of Hydrogen Bonding: s is better suited for the description of

hydrogen bonding than the ESP usually employed in MFA. A. Klamt, J. Reinisch, F. Eckert, A. Hellweg, M. Diedenhofen, Polarization charge densities provide a predictive quantification of hydrogen bond energies, Phys. Chem. Chem. Phys. 2012, 14, 955-963;

COSMOsim3D• Pairwise Alignment and Similarity Assessment: LSPs instead of chemical

structure or pharmacophores enable scaffold hopping and dealing with datasets of different chemotypes.

• Multi-template alignment: The superposition of multiple aligned template molecules can be used as a virtual template molecule.

• Ligand-Based Virtual Screening: Ranking potential ligand molecules according to their similarity to a single template or a virtual multi-template allows for enrichment of ligand sets with potential cognate drugs.

Linear Relationship of logKi and LSPs Consider a protein receptor R and its aqueous embedding as a locally slightly flexible matrix with locally varying s-potential. The free energy of a ligand L in R, assuming that the receptor generates a position-dependent s-potential, is

Similarly, the free energy of a solute X in a homogeneous solvent S is

Interpolating on a grid the position-dependent s-potential of ligand L in receptor R gives:

The same holds for mL in water, even with a position-independent s-potential mW(is).

→ Linear relationship to pKi !

an and sn are the surface area and polarization charge density of a surface segment n of solute X

𝜇𝑅𝐿 = 𝑐 𝑅 + 𝑆𝐶𝑂𝑆𝑀𝑂

𝐿 𝜇𝑅 𝑟, 𝜎 𝑟 𝑑2𝑟 = c 𝑅 + 𝜈𝜖𝐿 𝑎𝜈𝜇𝑅 𝑟𝜈, 𝜎𝜈 = 𝑐 𝑅 + 𝜈𝜖𝐿 𝑎𝜈 𝜇𝑅 𝑟𝜈, 𝜎𝜈

𝜇𝑆𝑋 = 𝑆𝐶𝑂𝑆𝑀𝑂

𝑋 𝜇𝑆 𝜎 𝑟 𝑑2𝑟 + 𝑘𝑇 ln 𝛾𝑐𝑜𝑚𝑏 𝑋, 𝑆 = 𝑐 𝑆 + 𝜈𝜖𝑋 𝑎𝜈𝜇𝑆 𝜎𝜈

𝜇𝑅𝐿 = c 𝑅 + 𝜈𝜖𝐿 𝑎𝜈 𝜇𝑅 𝑟𝜈 , 𝜎𝜈 = 𝑖𝑥,𝑖𝑦,𝑖𝑧,𝑖𝜎 𝜇𝑅 𝑖𝑥, 𝑖𝑦, 𝑖𝑧, 𝑖𝜎 LSP𝐿 𝑖𝑥, 𝑖𝑦, 𝑖𝑧, 𝑖𝜎