KNIME as a platform for distributed molecular property predictions
Transcript of KNIME as a platform for distributed molecular property predictions
![Page 1: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/1.jpg)
Johannes Koppe, Andreas Teckentrup, Nils Weskamp
KNIME as a platform for distributed molecular property predictions
![Page 2: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/2.jpg)
Motivation
BICEPS
Synthesis ideas
Sample registration
Vendor catalogs
Virtual libraries
Screening hits
…
Predicted molecular properties
Compound filtering andprioritization
Most promising compounds
![Page 3: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/3.jpg)
BICEPS: assisting design of chemical structures
![Page 4: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/4.jpg)
BICEPS: technical implementation
PIDAMDB
CIDB
ISIS/Base Frontend
Oracle Oracle
SDF
(Conductor)
XML
…
![Page 5: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/5.jpg)
BICEPS: number of generated data points over 12 months
0
2.000.000
4.000.000
6.000.000
8.000.000
10.000.000
12.000.000
14.000.000
16.000.000
18.000.000
20.000.000
123456789101112
BICEPS
CIDB
01/1
1
12/1
0
11/1
0
10/1
0
09/1
0
08/1
0
07/1
0
06/1
0
05/1
0
04/1
0
03/1
0
02/1
0
![Page 6: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/6.jpg)
BICEPS: experiences under high load conditions
• Altogether, nearly 55 million results generated over the last 12 months
• High variability in monthly system load
– Peaks in system load often due to model updates and new compound collections
• Issues of scalability, load balancing and reliability under high load conditions
– Need for manual interventions and significant maintenance efforts
– Client-Server architecture of many workflow tools leads to bottlenecks and single points of failure
– Scale-up significantly increases license costs
• Decision to evaluate fully-distributed system architecture based on KNIME
![Page 7: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/7.jpg)
Distributed KNIME setup @ BI
Interactive use:
Headless use:
Conductor
scheduling and submitting
deployed KNIME jobs
Compute Node
Compute NodeCompute Node
Compute Node
Compute Node
Compute NodeCompute Node
User @ Workstation
![Page 8: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/8.jpg)
Collaboration with KNIME.com
• Existing workflows rely heavily on various external tools for structure manipulation and descriptor calculation
• Generic integration of external tools into KNIME necessary (incl. optional parallel cluster execution)
– Workflow migration to KNIME without switching the underlying chemistry tools
• Sponsoring of a generic “External Tool” node/framework
– Specifications by BI in collaboration with KNIME.com
– Implementation by KNIME.com
– Code part of the KNIME open source release
• Adaptation of the generic framework for specific tools and setup @ BI
![Page 9: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/9.jpg)
External tool integration incl. cluster execution
Generic ExtTool FrameworkDefault Executor
(local job execution)
BI SGE Executor(DRMAA-based
job submission toSGE)
SSH Remote Executor
Corina-specific Node Customizer
(Options, GUI, Filetypes)
…
Daylight-specific Node Customizer
(Options, GUI, Filetypes)
Moe-specific Node Customizer
(Options, GUI, Filetypes)
…
![Page 10: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/10.jpg)
External tool integration incl. cluster execution
Corina options
Batch creation
for parallel
execution
![Page 11: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/11.jpg)
External tool integration incl. cluster execution
Local execution
vs. grid engine
submission
Error handling
and automatic
re-submission of
failed jobs
Automatic queue
selection based
on external tool
characteristics
![Page 12: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/12.jpg)
External tool integration incl. cluster execution
![Page 13: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/13.jpg)
Example workflow – property calculation (PROP4)
![Page 14: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/14.jpg)
Current status
• Most of the necessary external chemistry tools available within KNIME
• Large number of workflows for property predictions ported to KNIME
– Testing under realistic conditions yielded promising results
– Identical results from old and new workflows
– Switch to productive use of KNIME expected soon
• Rollout of KNIME within the computational chemistry group for interactive usage
– First end-user training completed
– Currently extensive evaluation within the group
– Received a lot of feedback, mainly on usability issues
– Decision on productive use of KNIME in this application expected in 2011
![Page 15: KNIME as a platform for distributed molecular property predictions](https://reader031.fdocuments.in/reader031/viewer/2022012011/613d0e1f736caf36b758c7e7/html5/thumbnails/15.jpg)
Acknowledgements
• Oliver Wissdorf
• Bernd Wiswedel (KNIME.com)