Clustering of Small Molecules Based on Similarity Scores ... · International Conference and...

1
Clustering of Small Molecules Based on Similarity Scores From Flexible 3D Alignment Adrian Kalaszi, Gabor Imre, Miklos J. Szabo, Timea Polgar, Krisztian Niesz ChemAxon Ltd., 1031 Budapest, Zahony u. 7, Hungary Abstract There are several approaches for clustering chemical structures. Among these, the structure-based methods and techniques using classical 2D descriptors (e.g. chemical fingerprints or ECFP) are the most widely used. Considering 3D information, such as conformers, 3D pharmacophore maps or molecular shapes can provide researchers more insight into the process and facilitate a deeper and more natural interpretability of corresponding results. ChemAxon’s 3D alignment tool provides an automatic 3D shape-based flexible alignment option for handling small molecules and the resulting shape similarity scores calculated for the best fits can be further used in similarity-based clustering as a part of scaffold hopping for finding new lead molecules. Molecular Med TRI-CON 2013, February 11-15, 2013 Introduction It is generally accepted that molecular shape properties play a central role in ligand binding. Based on the growing number of publications in the field, several descriptors and methods have been applied in shape- based similarity screening [1]. These methods compete with other virtual screening techniques, such as ligand and structure-based methods [2, 3]. Therefore, it is also expected that considering 3D shape alignment-based similarity in clustering may also bring new and novel aspects besides the information provided by traditional 2D clustering methods. ChemAxon in 3D 3D structure generation / conformational analysis Generate3D [4,5] is a molecular coordinate generation / conformational analysis component of ChemAxon’s discovery tools (released in 2002), which is used by Marvin GUI’s Structure / Clean3D function, Conformers Calculator Plugins as well as the molconvert command line tool. 3D flexible alignment The 3D flexible alignment procedure (released in 2009; [6]) overlays two structures by maximizing the intersection of their van der Waals volumes. The volume is partitioned by the underlying atomic properties, such as extended atom types (force field types) or pharmacophoric types. Both molecules can be treated flexible by tweaking their rotatable bonds, flexible rings and ring systems in a continuous manner during the alignment. A single 3D conformer for each aligned structure is used as input for the alignment procedure. Thus, this method provides valid 3D similarity scores for 2D / 0D input structures by automatically calling Generate3D. After the alignment is completed the size of the volume intersection and the 3D Tanimoto (a dimensionless measure of similarity between 0 and 1) can be obtained for further processing. Example alignment workflow: 1) 2D input structures; 2) 3D conformer is generated and the shape is colored by atomic types; 3) the volume intersection, which maximized during the alignment, is shown along with the resulting pose. 3D similarity ligand based virtual screening Screen3Dis a ligand based 3D similarity calculation tool released in 2010. Screen3Dcalculates the intersection of the colored shape and the 3D Tamimoto. Apart from these shape-based measures Screen3Dcan also return a 3D similarity score calculated from intermolecular distance ranges [7]. The distance ranges are calculated for each molecule by tweaking rotatable bonds to maximize or minimize the distance between every pair of the selected atoms. The distance range similarity score is comparable in screening performance to the shape based counterpart. Benchmark results: Venkatraman et. al. [2] compared the performance of various 2D and 3D similarity methods on the Directory of Useful Decoys [8]. The values represented by bluish columns are originated from their work; Screen3D performance results - shown in orange - were measured in house based on this publication, using the same approach. (SCREEN3D_S8V: shape similarity with volume intersection score, SCREEN3D_S8T: shape similarity with 3D Tanimoto score, SCREEN3D_H: distance range based similarity). Clustering - JKlustor JKlustorSuite [9, 10] performs similarity and structure-based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion. In addition ”JKlustor” Suite can carry out diversity calculations and library comparisons based on molecular fingerprints and other descriptors. It is an essential tool in combinatorial chemistry, virtual library design or other areas where a large number of compounds need to be analyzed. The approach currently presented introduces 3D flexible alignment-based similarity calculation to the JKlustor Suite. This allows the available similarity based algorithms to use structural data in these clustering processes. Aligned structure pair Aligned shapes 2D (0D) input Flexibly aligned results ChemAxon Graphisoft Park, Hx Building H-1037 Budapest, Hungary Phone: +36 1 453 2660 Fax: +36 1 453 2659 http://www.chemaxon.com Structural frameworks MCS MCES 3D flexible alignment Chemical hashed fp BCUT-like* ECFP Pharmacophore 2D fp* Calculated property- based* User defined FCFP* 2D (0D) structure based algorithms Molecular descriptors Euclidean Tanimoto Intersection Similarity metrics Sphere exclusion K-means Ward’s minimum variance* Similarity-based clustering Euclidean Tanimoto Structure-based clustering Jarvis- Patrick* An overview of algorithms and descriptors available to use from the JKlustor suite. *Note: some components are available as standalone tools Proof of concept implementation The interface to the 3D flexible alignment functionality has been implemented in JKlustor through a transparent pairwise similarity calculation. Furthermore, a visualization tool is also provided in order to compare the results of the alignment-based similarity calculation with other descriptor implementations. Sphere exclusion Algorithm interface Generate 3D Aromacity H atoms Structure ID Descriptor cache Input structures (smiles) Cache file (DB)) Descriptor interface Orchestration, execution, hierarchy representation Visualization, UI (Web-enabled) Output 3D flex. Align. interface Flexible 3D alignment engine UI Client Structures, clusters,... Architecture of the JKlustor extension. Interaction points with the user; standard JKlustor elements and the interaction points with the flexible 3D alignment engine are depicted. Clustering results of a small 3D fragment library Clusters (centroids) resulting from a sphere exclusion clustering (r=0.4) of the heat shock protein 90 (hsp90) ligands contained by the DUD database. The bar lengths are proportional to the cluster size. References [1] Haigh, J. A.; Pickup, B. T.; Grant, J. A.; Nicholls, A.: Small Molecule shape-fingerprints. J. Chem. Inf. Model. 2005, 45, 673−684. [2] Venkatraman, V.; Perez-Nueno, V. I.; Mavridis, L.; Ritchie, D. W.: Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 2010, 50, 2079−93. [3] Hu, G.; Kuang, G.; Xiao, W.; Li, W.; Liu, G.; Tang, Y.: Performance Evaluation of 2D Fingerprint and 3D Shape Similarity Methods in Virtual Screening. J. Chem. Inf. Model. 2012, 52, 1103−1113 [4] http://www.chemaxon.com/marvin/help/calculations/conformation.html#conformer [5] http://www.chemaxon.com/conf/Advanced_automatic_generation_of_3D_molecular_structures.pdf [6] Marvin 5.1.2, 2012, ChemAxon (http://www.chemaxon.com) [7] Deng, W.; Kalászi, A.: Screen3D: A Ligand-based 3D Similarity Search without Conformational Sampling. International Conference and Exhibition on Computer Aided Drug Design & QSAR Oct 29th, 2012 Chicago, IL, USA [8] Irwin, J. J.; Community benchmarks for virtual screening. J. Comput.- Aided Mol. Des. 2008, 22, 193-9. [9] http://www.chemaxon.com/products/jklustor/ [10] http://www.chemaxon.com/conf/JKlustor.ppt 1 2 3 Using 3D similarity approaches to identify scaffold hopping cases In most scaffold hopping cases, the compared molecules look very similar in terms of their 3D properties, but they look quite different in terms of their 2-dimensional representation. Thus, it is proposed that such scaffold hopping cases can be captured by the comparison of the calculated 2D and 3D similarities. 3D SHAPE dissimilarity A) Scaffold hopping cases for Antihistamine drugs: 3D shape similarity values / 2D ECFP similarity values together with the corresponding pair wise 3D alignment of molecules; B) The corresponding molecular pairs shown in the 2D ECFP vs. 3D SHAPE dissimilarity space. 2D ECFP dissimilarity Do stop by booth 333 to pick up a discussion paper on our discovery tools or a reprint of this poster.

Transcript of Clustering of Small Molecules Based on Similarity Scores ... · International Conference and...

Page 1: Clustering of Small Molecules Based on Similarity Scores ... · International Conference and Exhibition on Computer Aided Drug Design & QSAR Oct 29th, 2012 Chicago, IL, USA [8] Irwin,

Clustering of Small Molecules Based on Similarity Scores From Flexible 3D

Alignment

Adrian Kalaszi, Gabor Imre, Miklos J. Szabo, Timea Polgar, Krisztian Niesz

ChemAxon Ltd., 1031 Budapest, Zahony u. 7, Hungary Abstract There are several approaches for clustering chemical structures. Among these, the structure-based methods and techniques using classical 2D descriptors (e.g. chemical fingerprints or ECFP) are the most widely used.

Considering 3D information, such as conformers, 3D pharmacophore maps or molecular shapes can provide researchers more insight into the process and facilitate a deeper and more natural interpretability of corresponding

results. ChemAxon’s 3D alignment tool provides an automatic 3D shape-based flexible alignment option for handling small molecules and the resulting shape similarity scores calculated for the best fits can be further used in

similarity-based clustering as a part of scaffold hopping for finding new lead molecules.

Molecular Med TRI-CON 2013, February 11-15, 2013

Introduction It is generally accepted that molecular shape properties play a central role in ligand binding. Based on the

growing number of publications in the field, several descriptors and methods have been applied in shape-

based similarity screening [1]. These methods compete with other virtual screening techniques, such as ligand

and structure-based methods [2, 3]. Therefore, it is also expected that considering 3D shape alignment-based

similarity in clustering may also bring new and novel aspects besides the information provided by traditional 2D

clustering methods.

ChemAxon in 3D 3D structure generation / conformational analysis

Generate3D [4,5] is a molecular coordinate generation / conformational analysis component of ChemAxon’s

discovery tools (released in 2002), which is used by Marvin GUI’s Structure / Clean3D function, Conformers

Calculator Plugins as well as the molconvert command line tool.

3D flexible alignment

The 3D flexible alignment procedure (released in 2009; [6]) overlays two structures by maximizing the

intersection of their van der Waals volumes. The volume is partitioned by the underlying atomic properties,

such as extended atom types (force field types) or pharmacophoric types. Both molecules can be treated

flexible by tweaking their rotatable bonds, flexible rings and ring systems in a continuous manner during the

alignment. A single 3D conformer for each aligned structure is used as input for the alignment procedure. Thus,

this method provides valid 3D similarity scores for 2D / 0D input structures by automatically calling

Generate3D. After the alignment is completed the size of the volume intersection and the 3D Tanimoto (a

dimensionless measure of similarity between 0 and 1) can be obtained for further processing.

Example alignment workflow: 1) 2D input structures; 2) 3D conformer is generated and the shape is colored by atomic types; 3) the

volume intersection, which maximized during the alignment, is shown along with the resulting pose.

3D similarity – ligand based virtual screening

”Screen3D” is a ligand based 3D similarity calculation tool released in 2010. ”Screen3D” calculates the

intersection of the colored shape and the 3D Tamimoto. Apart from these shape-based measures ”Screen3D”

can also return a 3D similarity score calculated from intermolecular distance ranges [7]. The distance ranges

are calculated for each molecule by tweaking rotatable bonds to maximize or minimize the distance between

every pair of the selected atoms. The distance range similarity score is comparable in screening performance

to the shape based counterpart.

Benchmark results: Venkatraman et. al. [2] compared the performance of various 2D and 3D similarity methods on the Directory of

Useful Decoys [8]. The values represented by bluish columns are originated from their work; Screen3D performance results - shown

in orange - were measured in house based on this publication, using the same approach. (SCREEN3D_S8V: shape similarity with

volume intersection score, SCREEN3D_S8T: shape similarity with 3D Tanimoto score, SCREEN3D_H: distance range based

similarity).

Clustering - JKlustor ”JKlustor” Suite [9, 10] performs similarity and structure-based clustering of compound libraries and focused

sets in both hierarchical and non-hierarchical fashion. In addition ”JKlustor” Suite can carry out diversity

calculations and library comparisons based on molecular fingerprints and other descriptors. It is an essential

tool in combinatorial chemistry, virtual library design or other areas where a large number of compounds

need to be analyzed. The approach currently presented introduces 3D flexible alignment-based similarity

calculation to the JKlustor Suite. This allows the available similarity based algorithms to use structural data in

these clustering processes.

Aligned structure pair Aligned shapes

2D (0D) input Flexibly aligned results

ChemAxon Graphisoft Park, Hx Building H-1037 Budapest, Hungary

Phone: +36 1 453 2660 Fax: +36 1 453 2659 http://www.chemaxon.com

Structural

frameworks

MCS MCES

3D flexible

alignment

Chemical

hashed fp BCUT-like*

ECFP Pharmacophore

2D fp*

Calculated

property-

based*

User defined

FCFP*

2D

(0

D)

str

uc

ture

ba

se

d

alg

ori

thm

s

Mo

lecu

lar

de

sc

rip

tors

Euclidean

Tanimoto

Intersection

Sim

ila

rity

metr

ics

Sphere

exclusion

K-means

Ward’s

minimum

variance*

Sim

ila

rity

-ba

se

d c

lus

teri

ng

Euclidean

Tanimoto

Str

uc

ture

-ba

se

d

clu

ste

rin

g

Jarvis-

Patrick*

An overview of algorithms and descriptors available to use from the JKlustor suite.

*Note: some components are available as standalone tools

Proof of concept implementation The interface to the 3D flexible alignment functionality has been implemented in JKlustor through a

transparent pairwise similarity calculation. Furthermore, a visualization tool is also provided in order to

compare the results of the alignment-based similarity calculation with other descriptor implementations.

Sphere

exclusion

Alg

ori

thm

inte

rfa

ce

Generate 3D

Aromacity

H atoms

Structure ID

Descriptor

cache

Input

structures

(smiles)

Cache file

(DB))

Des

cri

pto

r in

terf

ace

Orchestration,

execution,

hierarchy

representation

Visualization, UI

(Web-enabled)

Output

3D

fle

x.

Ali

gn

.

inte

rfa

ce

Flexible 3D

alignment

engine

UI Client

Structures,

clusters,...

Architecture of the JKlustor extension. Interaction points with the user; standard JKlustor elements and the

interaction points with the flexible 3D alignment engine are depicted.

Clustering results of a small 3D fragment library

Clusters (centroids) resulting from a sphere exclusion clustering (r=0.4) of the heat shock protein 90 (hsp90) ligands contained

by the DUD database. The bar lengths are proportional to the cluster size.

References [1] Haigh, J. A.; Pickup, B. T.; Grant, J. A.; Nicholls, A.: Small Molecule shape-fingerprints. J. Chem. Inf. Model. 2005,

45, 673−684.

[2] Venkatraman, V.; Perez-Nueno, V. I.; Mavridis, L.; Ritchie, D. W.: Comprehensive comparison of ligand-based virtual

screening tools

against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model. 2010, 50, 2079−93.

[3] Hu, G.; Kuang, G.; Xiao, W.; Li, W.; Liu, G.; Tang, Y.: Performance Evaluation of 2D Fingerprint and 3D Shape

Similarity Methods in

Virtual Screening. J. Chem. Inf. Model. 2012, 52, 1103−1113

[4] http://www.chemaxon.com/marvin/help/calculations/conformation.html#conformer

[5] http://www.chemaxon.com/conf/Advanced_automatic_generation_of_3D_molecular_structures.pdf

[6] Marvin 5.1.2, 2012, ChemAxon (http://www.chemaxon.com)

[7] Deng, W.; Kalászi, A.: Screen3D: A Ligand-based 3D Similarity Search without Conformational Sampling.

International Conference and Exhibition on Computer Aided Drug Design & QSAR Oct 29th, 2012 Chicago, IL, USA

[8] Irwin, J. J.; Community benchmarks for virtual screening. J. Comput.- Aided Mol. Des. 2008, 22, 193-9.

[9] http://www.chemaxon.com/products/jklustor/

[10] http://www.chemaxon.com/conf/JKlustor.ppt

1

2

3

Using 3D similarity approaches to identify scaffold hopping cases

In most scaffold hopping cases, the compared molecules look very similar in terms of their 3D properties, but

they look quite different in terms of their 2-dimensional representation. Thus, it is proposed that such scaffold

hopping cases can be captured by the comparison of the calculated 2D and 3D similarities.

3D SHAPE dissimilarity

A) Scaffold hopping cases for Antihistamine drugs: 3D shape similarity values / 2D ECFP similarity values together

with the corresponding pair wise 3D alignment of molecules; B) The corresponding molecular pairs shown in the

2D ECFP vs. 3D SHAPE dissimilarity space.

2D

EC

FP d

issi

mila

rity

Do stop by booth 333 to pick up a discussion paper on our discovery

tools or a reprint of this poster.