The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.

The new JKlustor suite

Miklós Vargyas

Solutions for Cheminformatics

Why do we cluster?

• to reduce the number of objects to deal with– group subsets together– represent each group by one member of it

What’s the matter with that?

– tedious• parameter tuning in a trial-and-error fashion

– lack of interpretability• the algorithm does not provide explanation

– often do not meet chemists’ expectation

Why is clustering molecules hard?

• lack of innate spatial arrangement– artificial arrangement

• infinite types of chemical spaces• various ‘distance metrics’• usually high dimensionality (hard to visualize)

– various approaches, no superior one• “best method” depends on application area, and on

actual data

What do we need?

• no/few tuning

• easy to understand simple “explanation”

• novel approach– structure based clustering– Maximum Common Substructure– Molecular frameworks

Maximum Common Substructure

• largest substructure shared by two molecules

• Simple concept! More “human”, visual.

• Yet hard (= expensive (= slow)) to compute.

MCS complexity

• Sub-structure searching – query structure is known, it “only” have to be found

as part of the target structure (subgraph isomorphism)

– graph isomorphism is even “simpler” yet NP-hard• finding the answer can take long (scales exponentially

with respect to the number graph vertexes) in the worst case

• validating an answer is fast

• MCS – “query” structure is not known

– all possible substructures need to be checked• even the number of substructures is exponential!

MCS algorithms

• two camps

backtracking clique detection

ad hoc high mathematical elegance

average complexity is better than worst case

average complexity is same as worst case

dynamic heuristics static (initial) heuristics

coloring is easy coloring is hard

fuzzy matching fussy matching

MCS of a structure set

LibraryMCS: Hierarchical MCS

Intuitive visualization

SAR table view

R-group decomposition

LibraryMCS scales linearly

0 5000 10000 15000 20000 25000 30000 35000

Structure count

Linear (2007)

Clustering performance comparison

0 20000 40000 60000 80000 100000 120000

Structure count

LibraryMCSJarvis-PatrickWard-Murtagh

Behind performance

• MCS search– exhaustive– heuristics

• exact• inexact

• Predictive MCS coupling in clustering– all pairs are not feasible– rich fingerprinting

Live demonstration

• Affect of use of heuristics– on average < 10% misclassifications– useful for obtaining birds-eye-view of a

larger/diverse sets

1M< compounds libraries

• Molecular scaffolds, – Rings, ring systems– Bemis-Murcko frameworks

• Sphere exclusion – Variants… linear scaling

Fast clustering methods

Jklustor roadmap

• In the dev pipeline– IJC integration– Spotfire integration– new dynamic viewer

• Planned– disconnected MCS– multiple class members

Acknowledgements

• Gábor Imre

• Judit Vaskó-Szedlár

• Péter Vadász

The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.

Documents

Transcript of The new JKlustor suite Miklós Vargyas Solutions for Cheminformatics.

Cheminformatics and mass spectrometry course - Fiehn …fiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics... · Mass Spectrometry meets Cheminformatics ... • Analytical

QUANTUM CHEMINFORMATICS: AN OXYMORON

1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.

Cheminformatics: An overview

Cheminformatics Tools for Enabling Metabolomics Yannick ... · Cheminformatics Tools for Enabling Metabolomics by ... development of cheminformatics tools for data organization and

Cheminformatics and mass spectrometry course - …fiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics... · Mass Spectrometry meets Cheminformatics Tobias Kind and

1 Miklós Vargyas May, 2005 Compound Library Annotation.

Advances in Cheminformatics

Miklós Maros - uploads.documents.cimpress.io

Open Source Cheminformatics

Screen Ligand based virtual screening presented by … maintained by Miklós Vargyas Last update: 13 April 2010.

Ph.D. thesis HELTAI MIKLÓS Gödöllő · Ph.D. thesis HELTAI MIKLÓS Gödöllő 2002. STATUS AND DISTRIBUTION OF CARNIVORES IN HUNGARY Ph.D. thesis HELTAI MIKLÓS Gödöllő 2002.

Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

Bringing cheminformatics toolkits into tune

Cheminformatics toolkits: a personal perspective

Cheminformatics in R

Impromptu - Miklós Maros

Solutions for Cheminformatics

View - Journal of Cheminformatics

C2D Cheminformatics : Methods,Tools and Results By OSDD-Cheminformatics team.