Poster wellcome-trust-2013-herhaus-fuzzy-mcs

1
Improving Cluster Representations by "Fuzzifying" Maximum Common Substructures Introduction Arranging similar structures in clusters is one of the typical tasks of modern Chemoinformatics with high impact in HTS follow-up, generation of structure activity relationships (SAR) and selection of starting points for compound optimization. Methods for cluster generation are as diverse as the structures which they are applied to [1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend to orientate themselves in structure subsets like clusters with the help of substructures, so-called "scaffolds", which intuitively characterize the structural relationships between the molecules of the subset. In the case of substructure-based clustering, well established methods are existing for generation of Maximum Common Substructures (MCS) which are present in all members of the structure population or a defined proportion thereof [2]. But in the case of similarity- based clusters, such MCS may either not be existing for the required dataset proportion or the common substructure may be so small that it is no longer representative and therefore lacking information. Aim of this approach is to provide descriptive scaffold structures also for clusters whose intrinsic structural diversity is too high to be characterized by conventional Maximum Common Substructure approaches. Christian Herhaus Merck KGaA, Computational Chemistry, Frankfurter Str. 250, 64293 Darmstadt, [email protected] Results The approach presented here allows the generation of MCS also for similarity-based clusters with a given inherent structural diversity. It does so by utilizing query features like atoms lists and query bonds to add "fuzziness" in atom and bond information to the MCS. As a result, the generated "fuzzy" MCS, although still being fully database-searchable, is more meaningful for the characterisation of clusters as it can cover larger parts of the full structures than a conventional MCS could do. The approach was implemented in Pipeline Pilot™ for proof of concept but is general enough to be transferred to other technical platforms as well. Conventional MCS "Fuzzy" MCS Methods Content of the Pipeline Pilot™ component "Generate Fuzzy MCS" (slightly simplified) 1. Reduced graphs 2. Generalized MCS any any any any any any any any any any any any any any any any any any any any any any any any any any any any any any 6. Query bonds coded as stereo bonds 7. Final "fuzzy" MCS any s/d s/d s/d any s/d s/d s/d [1] Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry (18). Chichester: Wiley and Sons; 2003, 1-40. [2] Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 2011, 1:68-79. Where to get / How to contribute The current version of the component is published on the Accelrys ® Pipeline Pilot™ user forum and can be downloaded from https://community.accelrys.com/message/16528. Contributors are appreciated for improving the approach and for transfering the concept to other technology platforms like RDKit, CDK or KNIME. Current limitations • Aromaticity may cause unexpected effects (e.g. single/double vs. aromatic bonds) • Symmetric substructures may blur atom/bond information (e.g. carboxyl or nitro groups) • Currently, just one presence of an atom/bond feature leads to inclusion into a query feature. For larger clusters, a percentual threshold may be required • Different ring sizes and chain lengths can so far not be described by query patterns. 5. Collected atom/bond information transfered into queries Atom No Elements 1 O,O,S 7 N,N,O 11 C,C,C Bond No Types 3 1,1,1 9 1,2,1 14 2,1,4 Atom No Element 1 [O,S] 7 [N,O] 11 C Bond No Type 3 single 9 single/double 14 aromatic 3. Generalized MCS mapped onto structures 1 9 6 11 17 20 7 16 13 4. Structures reduced & mappings normalized 1 1 1 7 7 7 11 11 11

Transcript of Poster wellcome-trust-2013-herhaus-fuzzy-mcs

Page 1: Poster wellcome-trust-2013-herhaus-fuzzy-mcs

Improving Cluster Representations by "Fuzzifying" Maximum Common Substructures

IntroductionArranging similar structures in clusters is one of the typical tasks of modern

Chemoinformatics with high impact in HTS follow-up, generation of structure activity

relationships (SAR) and selection of starting points for compound optimization.

Methods for cluster generation are as diverse as the structures which they are applied to

[1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend to orientate themselves in structure subsets like clusters with the help of substructures,

so-called "scaffolds", which intuitively characterize the structural relationships between

the molecules of the subset.

In the case of substructure-based clustering, well established methods are existing for

generation of Maximum Common Substructures (MCS) which are present in all members

of the structure population or a defined proportion thereof [2]. But in the case of similarity-

based clusters, such MCS may either not be existing for the required dataset proportion

or the common substructure may be so small that it is no longer representative and

therefore lacking information.

Aim of this approach is to provide descriptive scaffold structures also for clusters whose

intrinsic structural diversity is too high to be characterized by conventional Maximum

Common Substructure approaches.

Christian HerhausMerck KGaA, Computational Chemistry, Frankfurter Str. 250, 64293 Darmstadt, [email protected]

ResultsThe approach presented here allows the generation of MCS also for similarity-based

clusters with a given inherent structural diversity. It does so by utilizing query features like

atoms lists and query bonds to add "fuzziness" in atom and bond information to the MCS.

As a result, the generated "fuzzy" MCS, although still being fully database-searchable, is

more meaningful for the characterisation of clusters as it can cover larger parts of the full structures than a conventional MCS could do. The approach was implemented in Pipeline

Pilot™ for proof of concept but is general enough to be transferred to other technical

platforms as well.

Conventional

MCS

"Fuzzy"

MCSMethodsContent of the Pipeline Pilot™ component "Generate Fuzzy MCS" (slightly simplified)

1. Reduced graphs 2. Generalized MCS

any any

any

anyany

any

anyany

any anyany any

any

anyany

any any

any

anyany

any

any any

any any

any any

any any

any

6. Query bonds coded as stereo bonds

7. Final "fuzzy" MCS

any

s/ds/d

s/d

any

s/ds/d

s/d

[1] Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. In Reviews in Computational Chemistry (18). Chichester: Wiley and Sons; 2003, 1-40.[2] Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci 2011, 1:68-79.

Where to get / How to contributeThe current version of the component is published on the Accelrys® Pipeline

Pilot™ user forum and can be downloaded from

https://community.accelrys.com/message/16528.

Contributors are appreciated for improving the approach and for transfering the concept to other technology platforms like RDKit, CDK or KNIME.

Current limitations• Aromaticity may cause unexpected effects (e.g. single/double vs. aromatic bonds)

• Symmetric substructures may blur atom/bond information (e.g. carboxyl or nitro groups)

• Currently, just one presence of an atom/bond feature leads to inclusion into a query feature. For larger clusters, a percentual threshold may be required

• Different ring sizes and chain lengths can so far not be described by query patterns.

5. Collected atom/bond information transfered into queries

Atom No Elements

1 O,O,S

7 N,N,O

11 C,C,C

Bond No Types

3 1,1,1

9 1,2,1

14 2,1,4

Atom No Element

1 [O,S]

7 [N,O]

11 C

Bond No Type

3 single

9 single/double

14 aromatic

3. Generalized MCS mapped onto structures

1

9

6

1117

20

7

16

13

4. Structures reduced & mappings normalized

1

1

1

7

7

7

11

11

11