Docking@GRID – A Web Portal for Massively Parallel Flexible Docking, using the ChemAxon toolkit....
-
Upload
victoria-schroeder -
Category
Documents
-
view
220 -
download
1
Transcript of Docking@GRID – A Web Portal for Massively Parallel Flexible Docking, using the ChemAxon toolkit....
Docking@GRID – A Web Portal for Massively Parallel Flexible Docking,
using the ChemAxon toolkit.Dragos Horvath*Dragos Horvath*, Kun Attila, Kun Attila, Benjamin Parent*, , Benjamin Parent*,
Cyrielle BoutroueCyrielle Boutroue##, Even Gaël, Even Gaël##, Alexandru Tantar, Alexandru Tantar##, , Nouredine MelabNouredine Melab##, Sylvaine Roy, Sylvaine Roy & El-Ghazali & El-Ghazali
TalbiTalbi##
* UMR 8576, CNRS – Univ. Lille 1, FR* UMR 8576, CNRS – Univ. Lille 1, FR Chemistry Dept, Univ. Babes-Bolyai, Cluj, ROChemistry Dept, Univ. Babes-Bolyai, Cluj, RO
# LIFL CNRS/INRIA – Univ. Lille 1, FR# LIFL CNRS/INRIA – Univ. Lille 1, FR
DSV/iRTSV - CEA, Grenoble, FRDSV/iRTSV - CEA, Grenoble, FR
Outline…• The goal: automated fully flexible docking on
computer grids – GRID5000, http://www.grid5000.fr
– Specific conformational sampling & docking software based on hybrid genetic algorithms
– Upfront chemoinformatics tools to preprocess submitted ligands.
– Upfront tools to define the active site and its key degrees of freedom (!)
– Interface to start docking calculations & analyze results.
Genetic Algorithm-driven Conformational Sampling Tool
• Based on a Genetic Algorithm, coding conformers as "chromosomes" in which each locus stands for a torsional angle value.
n…
• The In Silico Darwinian Evolution, leading to fitter and fitter (lower energy) conformers, was enhanced by – hybridization with various optimization heuristics
– Fine-tuning of the parameters controlling the evolutionary strategy
Customized CVFF force field, employing:• a 10 Å cutoff (with a termination function)• a smoothing procedure to avoid interatomic clashes• a continuum solvent model
2*4 ijdd
jicoulomb dE
jikd
VQVQkE hphob
h
ji
ijjisolvSolv ,4
,
22
Effective interatomic distance d0ij
‘Sm
ooth
ing’
dis
tanc
e d i
j
GRID 5000-based ‘Planetary’ Model
If (free node)DEPLOY
Island Model
- Executables- Molecule File- Constraint Files- Seeds List- Taboo List- Operational Pars
-Stablest Chromosomes-Sampling Success Score
Solution Merger& Clusterer
Conformer & Cluster Database
‘Panspermia’ policy center‘recent’ clusters: seeds
‘old’ clusters: taboo
Sampling Success vs.Operational Pars
Stop:max. ‘Mission Nr.’
no new clusters sinceN ‘missions’
www.grid5000.fr
Operational ParsSelector
• Ab initioAb initio folding of Trp cage 1L2Y: native structure folding of Trp cage 1L2Y: native structure (reproducibly) found and ranked as most stable. (reproducibly) found and ranked as most stable. Planetary model used max. 20 nodes for 4…5 daysPlanetary model used max. 20 nodes for 4…5 days
Conformer # 1, RMS~1.8 Ǻ - good match to native structure
• Ab initioAb initio folding of Trp zipper 1LE1: native structure folding of Trp zipper 1LE1: native structure found and ranked as most stable. Planetary model found and ranked as most stable. Planetary model used max. 20 nodes for 4…5 daysused max. 20 nodes for 4…5 days
Conformer # 1, RMS~0.8 Ǻ - perfect match to native structure
• However, there is a high risk that almost well folded However, there is a high risk that almost well folded solutions, being declared taboo, block the access to solutions, being declared taboo, block the access to the correct fold !!the correct fold !!Conformer # 79, RMS~2.4 Ǻ - near-optimal fold closest to native structureConformer # 1, RMS~3.8 Ǻ - is a poor match of the native structure
Outline…• The goal: automated fully flexible docking on
computer grids – GRID5000, http://www.grid5000.fr
– Specific conformational sampling & docking software based on hybrid genetic algorithms
– Upfront chemoinformatics tools to pre-process submitted ligands.
– Upfront tools to define the active site and its key degrees of freedom (!)
– Interface to start docking calculations & analyze results.
Ligand Preprocessor…
Ligand File
UploadStandardize
Main Tautomer &Key µSpecies
(occurrence > m%)
All Tautomers &Major µSpecies
All Tautomers &Key µSpecies
AddExplicit H
Force FieldTyping
(PMapper)
JChem DataBaseCannonical SMILES
Dockable Conformer Families
Main Tautomer &Major µSpecies
UserToggle
Partial ChargeCalculation
GenerateConformer(s)
If new…
A selector of top N most likely tautomeric forms would be of outstanding help here – many among the enumenated tautomers are chemically meaningless!
Potential problems with resonant structures in the ChargePlugin:
try { ChgPlug.setTakeResonantStructure(true); chgMol=ChgPlug.setMolecule(currSpec,false,false); ChgPlug.run(); …}catch (Exception ResonantStructureFailed) { try { ChgPlug.setTakeResonantStructure(false); … } catch (Exception WhateverYouDoItBreaks) { … }}
Using PMapper to assign CVFF types to ligand atoms
• required SMARTS encoding of the CVFF ‘templates’ corresponding to local neighborhoods defining each potential type
Issues yet to be settled:
• use the Conformer Plugin to generate several hundreds of geometries
Conformer diversity control ? How many degrees of freedom can be handled without significant risk of missing key minima ? Docking will use a different force field – how ‘compatible’ are ConformerPlugin & CVFF energies?
• use the Conformer Plugin to generate a starting geometry, then use a ligand-specific GA-driven sampling engine to explore the phase space.
Outline…• The goal: automated fully flexible docking on
computer grids – GRID5000, http://www.grid5000.fr
– Specific conformational sampling & docking software based on hybrid genetic algorithms
– Upfront chemoinformatics package to pre-process submitted ligands.
– Upfront tools to define the active site and its key degrees of freedom (!)
– Interface to start docking calculations & analyze results.
Active Site Definition…
Ligand..
Fixed protein residuesFixed protein residues
Fixed backbone,Fixed backbone,Mobile sidechainsMobile sidechains
Flexible Loop:Flexible Loop:Backbone (Backbone ( but not but not ) )
& sidechains& sidechains
This part of the backbone is aThis part of the backbone is a« frozen » part of the flexible loop:« frozen » part of the flexible loop:
Rigid body rototranslationsRigid body rototranslations
Form
ally
« br
eak
»
bond
to u
nloc
k de
gree
s
of fr
eedo
m in
loop
Protein Preprocessing Tools…• At this point, the user has to explicitly provide:
– A BioSym .car protein file, with correct protonation states, partial charges and force field types for all protein atoms
– A list of fixed atoms– A list of explicitly ‘broken’ bonds to enable sampling ring
and fixed end loop geometries– A list of active torsional degrees of freedom (otherwise, all
potentially rotatable exocyclic single bonds will be considered)
• Will MarvinSpace evolve such as to allow for graphical input the above-mentioned information?
• Would the Charge Plugin, the MicroSpecies Plugin and PMapper work upon input of a .pdb file?
• JChem Database of defined active sites and their sampled unbound state geometries…
Outline…• The goal: automated fully flexible docking on
computer grids – GRID5000, http://www.grid5000.fr
– Specific conformational sampling & docking software based on hybrid genetic algorithms
– Upfront chemoinformatics tools to pre-process submitted ligands.
– Upfront tools to define the active site and its key degrees of freedom (!)
– Interface to start docking calculations & analyze results.
The Dock Manager• In an ideal world, an academic user may add own
molecule collections to the database, but should be allowed to try docking other people’s molecules as well…– Paranoia Manager: who’s allowed to dock ‘my’ compounds
and use ‘my’ active sites?– Make use of JChem facilities to search ligand database by
cannonical structures, and return all the conformers of associated µSpecies/Tautomers. Chemoinformatic filters welcome, even based on the Holy Rule of Five!
• Methodological progress on the docking algorithms still required:– Is rigid docking of each of ~102 ligand conformers into each
one of the ~104 active site geometries feasible? Would it be assimilable to flexible docking?
– How to score: free energy based on docked vs. unbound ensembles? What about µSpecies & Tautomer penalties?
Docked Conformer Visualization
Conclusions & Perspectives• This is a long-term ANR-funded public research
project: http://dockinggrid.gforge.inria.fr/• The primary goal is developing efficient GRID-based
conformational sampling & docking methodologies– http://paradiseo.gforge.inria.fr/ to provide the core routines
for parallel evolutionary computing
• However, chemically meaningful ligand and active site management is as important as the docking step!– ChemAxon tools for ligand standardizing, protonation,
charge & force field management, 3D-buildup, storage & retrieval, visualizing,…, are perfectly suited!
– Progress needed on macromolecule & active site management. TTHANKSHTHANKSA THANKSNTHANKSKTHANKSS