Open Source Cheminformatics in KNIME with the RDKit Nodes

15
Open Source Cheminformatics in KNIME with the RDKit Nodes Manuel Schwarze, NIBR IT Novartis Institutes for BioMedical Research, Basel 6 th KNIME Users Group Meeting Zurich, March 6-8, 2013 Frozen Tinguely Fountain in Basel

Transcript of Open Source Cheminformatics in KNIME with the RDKit Nodes

Page 1: Open Source Cheminformatics in KNIME with the RDKit Nodes

Open Source Cheminformatics in

KNIME with the RDKit Nodes

Manuel Schwarze, NIBR IT

Novartis Institutes for BioMedical Research, Basel

6th KNIME Users Group Meeting

Zurich, March 6-8, 2013

Frozen Tinguely Fountain in Basel

Page 2: Open Source Cheminformatics in KNIME with the RDKit Nodes

RDKit: What Is It?

Python (2.x), Java, C++ toolkit for cheminformatics

• Core data structures and algorithms in C++

• Heavy use of Boost libraries

• Python wrapper generated using Boost.Python

Functionality:

• 2D and 3D molecular operations

• Descriptor generation for machine learning

• Database cartridge for substructure and similarity searching

• Supports Mac/Windows/Linux

History:

• 2000-2006: Developed and used at Rational Discovery for building predictive models for ADME, Tox, biological activity

• June 2006: Open-source (BSD license) release of software, Rational Discovery shuts down

• to present: Open-source development continues, use within Novartis, contributions from Novartis back to open-source version

Page 3: Open Source Cheminformatics in KNIME with the RDKit Nodes

KNIME Integration1

Out of the box KNIME is strong on data processing and mining, but weak on chemistry

Goal: Develop a set of open-source RDKit-based nodes for KNIME that provide basic cheminformatics functionality

1 Work done together with knime.com

+

Distributed as KNIME community nodes

Binaries available as KNIME plug-in (no RDKit build/installation required)

Complete refactoring released in 2012:

• GUI alignment

• Improved processing speed

RDKit Node Wizard released in 2012

Work in progress:

• More nodes being added

• Existing nodes being improved

Page 4: Open Source Cheminformatics in KNIME with the RDKit Nodes

What’s There? New Updated Changed

Page 5: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: RDKit to InChI

• Conversion is based on the official

IUPAC reference library that was

integrated into the RDKit recently

• Many «switches» are available for

experts under the «Advanced»

Tab to influence conversion results

Page 6: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: InChI to RDKit

Page 7: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: IUPAC to RDKit

• Conversion is based on the

OPSIN Java library developed at

Cambridge University, UK

Page 8: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: RDKit Highlighting Atoms

• Output is done as SVG

Image of the structure with

highlighted atoms

Page 9: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: RDKit Interactive Table

• To be used like KNIME Interactive Table with

all features that it offers, e.g. hiliting

• Additionally, it has ability to show additional

information in column headers, currently only

structures based on SMILES values

• Output table = Input table

• Used also as direct view in some RDKit

Nodes, e.g. in RDKit Substructure Counter

Page 10: Open Source Cheminformatics in KNIME with the RDKit Nodes

NEW: RDKit SMILES Headers

• Possible options for SMILES

values in columns: Setting,

Replacing, Deleting, Retrieving

• First input and output port: Data table

• Optional second input table: SMILES

definitions and column assignments

• Second output table: SMILES definition in

data table after its execution

Page 11: Open Source Cheminformatics in KNIME with the RDKit Nodes

UPDATE: Molecule to RDKit

Page 12: Open Source Cheminformatics in KNIME with the RDKit Nodes

UPDATE: RDKit to Molecule

Page 13: Open Source Cheminformatics in KNIME with the RDKit Nodes

UPDATE: RDKit Substructure Counter

Page 14: Open Source Cheminformatics in KNIME with the RDKit Nodes

What Is Coming Next?

Integration of new KNIME Molecule Type into RDKit nodes

Suggestions for new features and improvements are welcome

Please post to the KNIME Community RDKit Forum: http://tech.knime.org/forum/rdkit

Page 15: Open Source Cheminformatics in KNIME with the RDKit Nodes

Acknowledgements

Novartis: • Greg Landrum (NIBR IT)

• David Nick (NIBR IT)

• Eddie Cao (NIBR IT)

• Marc Litherland (NIBR IT)

• Dan Karavakis (NIBR IT)

Rational Discovery: • Santosh Putta (currently at Nodality)

• Julie Penzotti

knime.com • Michael Berthold

• Thorsten Meinl

• Bernd Wiswedel

• Thomas Gabriel

• Peter Ohl

RDKit open-source community

KNIME forum members • Simon Richards

• James Davidson

• Steve Roughley

• Ed1

• many others ...