CS6104: Computational Structural Biology & Bioinformatics
Transcript of CS6104: Computational Structural Biology & Bioinformatics
CS6104: Computational Structural Biology & Bioinformatics
Homology Modeling and Molecular Docking
David BevanDept of Biochemistry
April 6, 2004
Some Components of Molecular Modeling
• Visualization• Molecular mechanics• Quantum mechanics• Molecular dynamics• Homology modeling• Molecular docking
Aims of Structural Genomics
• To determine or predict the 3D structures of all the proteins encoded in the genome
• Up to 40% of the known protein sequenceshave at least one segment related to one or more structures
=> Determine all of the folds=> Use homology modeling to
predict 3D structures
What is Homology?
• Cannot be partial• Assertion of homology is an hypothesis• Hypothesis is usually based on extent of
sequence similarity between proteins, though ultimately similar functions need to be demonstrated
Homology: having a commonevolutionary origin
Some Definitions
• Homologue (Homolog): proteins that are evolutionarily related
• Orthologue (Ortholog): homologues from different organisms
• Paralogue (Paralog): homologues from the same organism
Homology Modeling(Comparative Structure Modeling)
• 3D structures conserved to greater extent than primary structures
• Develop models of protein structure based on structures of homologues
• Using known structure as a “template”, calculate 3D model of a protein for which only know the sequence (the “target”)
• Model includes core, loops, and side chains
Steps in Homology Modeling
Template Selection• Identify protein structures related to
target and select those to be used as templates
• Involves searching a database such as at NCBI (e.g., BLAST at NCBI)
• Involves a certain amount of sequence alignment
Aligning Sequences
• Critical step in homology modeling• Many options to consider• Factors to consider
– Which algorithm to use– Which scoring method to apply– Whether and how to assign gap penalties
Algorithms for Alignment
• Earliest method was that of Needleman and Wunsch (1970)
• Another widely adopted one was by Smith and Waterman (1981)
• More recent are the heuristic methods such as BLAST and FASTA, which are more approximate but much faster
Needleman-Wunsch Algorithm• Global alignment method (i.e., match
sequences along entire length)• Produces an optimal alignment• Steps
– Setting up a matrix– Scoring the matrix– Identifying the optimal alignment
• Can be time-consuming• Not effective for highly divergent proteins
Smith and Waterman Algorithm• Local alignment method• Useful in database searching• Similar to global alignment
– Proteins arranged in a matrix– Optimal path along diagonal is sought
• Differences compared to global alignment– Can start alignment at internal position– Alignment does not have to extend to ends
Heuristic Search Methods• Developed to do rapid searches of
large databases• Not guaranteed to find globally
optimal solution• Rarely miss a significant match• Identify regions of potential interest
and then expand regions to identify alignment
• Include FASTA and BLAST
Scoring Alignments• Need some method of scoring to find optimal
alignment• Four general types of scoring have been applied
– Identity: considers only identical residues– Genetic code: considers the number of base changes in
DNA or RNA to interconvert codons for the amino acids
– Chemical similarity: considers physico-chemical properties
– Observed substitutions: considers substitution frequencies observed in alignments of sequences (*used the most*)
PAM Matrices• Dayhoff mutation data matrix originally developed to
study evolution of proteins• Uses probability of one amino acid mutating to a second
amino acid within a particular evolutionary time• Denoted PAM (Percentage of Acceptable Point Mutations)• One PAM is a unit of evolutionary divergence in which 1%
of the amino acids have been changed• Uses substitution frequencies from alignments of very
similar sequences and extrapolates to more distant relationships
• PAM40 will recognize short alignments of highly similar sequences
• PAM250 will recognize longer, weaker local alignments
BLOSUM Matrices• Matrices based on alignments of more distantly related
sequences• Uses alignments of short regions of related sequences• Sequences clustered into groups (blocks) based on
similarity at some threshold value of percentage identity
• Blocks substitution matrices (BLOSUM) derived based on substitution frequencies
• BLOSUM90 matrix derived using threshold of 90% identity => very similar sequences
• BLOSUM30 matrix derived using threshold of 30% identity => highly divergent sequences
Summary of PAM and BLOSUM Matrices
BLOSUM90
PAM30
BLOSUM 80 BLOSUM62
PAM120 PAM180BLOSUM45
PAM240
Less divergent More divergent
Mouse vs Rat
Mouse vsBacteria
Building the 3D Model• Rigid body assembly
– Rigid bodies from aligned sequences– Core region, loops, and side chains
• Segment matching– Most hexapeptide segments can be
clustered into ~100 structural classes– Use segments from homologs (or non-
homologs) to build up structure
Building the 3D Model (cont.)
• Satisfaction of spatial restraints– Generate restraints from templates– Assume distances and angles between
aligned template and target are similar– Minimize violations of all restraints using
distance geometry or optimization techniques (i.e., force field) to satisfy spatial restraints
Evaluating the 3D ModelProcheck
• Ramachandran plot• Planar peptide bonds• Side chain conformations that
correspond to those in rotamer library
• Hydrogen bonding• No bad atom-atom contacts
Evaluating the 3D Model3D-Profiler
• Based on statistical preferences of each of the 20 amino acids for particular environments within a protein
• Each residue position can be characterized by its environment
• Preferred environments for amino acids defined by three parameters– Area of each residue that is buried– Fraction of side-chain area that is covered by polar
atoms (i.e., O and N)– Local secondary structure
Refining the 3D Model
• MD and energy minimization• Application of restraints based on
experimental data (e.g., NMR, fluorescence)
Threading• Based on concepts of finite number of folds
and conservation of 3D structure• Challenge is to find fold(s) adopted by a given
sequence• Model a given amino acid sequence as each of
several folds from a pre-specified library• Models are evaluated by some criteria to
identify the most likely fold (e.g., pairwise contact “energies”)
⇒ instead of using homologues astemplates, uses library of folds
Applications of the Model
Arabidopsis thaliana Project
• 46 genes for family 1 β-glucosidases– 40 β-O-glucosidases– 6 β-S-glucosidases
• ~ 20 genes for family 35 β-galactosidases
An NSF 2010 Project
Model of Beta-Glucosidase
Red: Crystal structure Blue: Homology model
Crystal vs. Homology Model of β-Glucosidase
RMSD (heavy atoms) 2.53 Å
RMSD (backbone atoms) 1.82 Å
Possible Template Structures• Maize Glu1
– 1E1E: native enzyme– 1E1F: native enzyme in complex with PSG– 1E4L: E191D– 1E4N, 1E55, 1E56: E191D complexes
• Myrosinase– 1E4M: Sinapis alba 1.20 Angstroms– Many others
• Cyanogenic β-glucosidase from white clover - 1CBG• Bacillus polymyxa β-glucosidase
– 1BGA: native enzyme– 1BGG, 1E4I, 1TR1: complexes or mutants
• Bacillus circulans β-glucosidase - 1QOX
Approach to Homology Modeling• Identify and align templates• Align target sequences with templates
– Biology Workbench (CLUSTALW)– MODELLER4 and MODELLER6
• Build models– MODELLER4 and MODELLER6
• Evaluate models– PROCHECK– PROSAII
Evaluation of Models
0.4-10.2Average0 to 1.3-8.3 to -11.4Range
0.2-9.8At5g259800-10.1At1g60090
0.5-10.9At2g444701.0-9.8At1g662800-10.6At1g26560
0.2-11.3At5g44640
Rama - % disallowed
ProsaII z-score
Molecular Docking
• Receptor-ligand• Enzyme-substrate• Protein-DNA• Protein-protein• Receptor-drug
Attempt to predict the structure(s) of the intermolecular complex between two or more molecules.
General Considerations• Molecular representations
– Abstract or atoms– Flexible or fixed
• Juxtaposition of molecules– Interactive or automated– Search algorithm should create an optimum
number of conformations that include experimental binding modes
• Evaluation of complementarity– Scoring function– Force field energy functions
Search Algorithms• Molecular dynamics• Monte Carlo methods• Genetic algorithms• Fragment-based methods• Point complementary methods• Distance geometry methods• Tabu searches• Systematic searches
Docking Algorithms
SPROUTICMGrowMolSurflexSMoGAutoDockMCSSSLIDEGRIDFlexX/FlexELUDIDOCKDe novo DesignVirtual Screening
AutoDock• A suite of automated docking tools• Designed to predict how small
molecules, such as substrates or drug candidates, bind to a receptor
• Consists of three programs– AutoTors: to define torsions in ligand– AutoGrid: to calculate grids– AutoDock: to perform the docking– Also includes GUI called AutoDockTools
AutoDock Grid Maps• Pre-calculated and used as
look-up tables• Place probe atom at each grid
point and calculate energy• Grid for each type of atom
(e.g., C, O, N, H)• Interpolate ligand atom
positions relative to grid points• Energy based on van der
Waals, electrostatics, and hydrogen bonding
Structure-based Drug Design
• Directs discovery of a drug lead, a compound with at least micromolaraffinity for a target
• Involves combination of docking (virtual ligand screening) and experimental assays (high throughput screening)
Docking and HTS Overview
Docking HTS
Virtual Ligand Screening
Retinoid Receptors
• Members of superfamily of nuclear receptors (transcription factors)
• Two families (RARs and RXRs)• Subtypes denoted α, β, γ within each
family
Functions of Receptors
• Normal processes– Morphogenesis– Differentiation– Metabolism– Homeostasis
• Pathologies– Teratogenicity– Endocrine disruption
Natural Ligands for Retinoid Receptors
O-CH3 CH3
CH3
CH3 CH3 O
CH3 CH3
CH3
CH3
CH3
O O-
trans-Retinoic acid cis-Retinoic acid
Docking of trans-Retinoic Acid to RARγ
Blue = crystal structure
Red = re-docked ligand
Docking of RetinoidsDesignation Structure RMSD (Å)t-RA
O-CH3 CH3
CH3
CH3 CH3 O
0.30
c-RA CH3 CH3
CH3
CH3
CH3
O O-
0.37
BMS181156
O
CO
O-0.97
BMS184394-R
O
CO
O-0.29
CD564
O
CO
O-0.23
Kinesins
Eg5
• A kinesin motor protein
• Slides microtubules of developing spindle apart to pushcentrosomes apart
Monastrol• Inhibitor of Eg5• An anti-mitotic agent
Docking to Human Eg5
Docking to Drosophila Eg5