Protein Structure Prediction Samantha Chui Oct. 26, 2004.
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Central Dogma of Biology
Question: Given a protein sequence, to what conformation will it fold?
DNA sequence Protein sequence Protein structure
transcription& translation
folding
How does nature do it?
Hydrophobicity vs. hydrophilicity
Van der Waals interaction
Electrostatic interaction
Hydrogen bonds Disulfide bonds
Current Approaches
Experimental Methods X-ray crystallography NMR spectroscopy
Computational Methods Homology modeling
Similar sequences fold into similar structures Threading
Dissimilar sequences may fold into similar structures
Ab initio No similarity assumptions Conformational search
Assembly of sub-structural units
knownstructures
…
fragmentlibrary
proteinsequence
predictedstructure
“Small Libraries of Protein Fragments Model Native Protein Structures Accurately”Rachel Kolodny, Patrice Koehl, Leonidas Guibas, and Michael Levitt, 2002
Goal: Find finite set of protein fragments that can be used to construct accurate discrete conformations for any protein
1. Generate fragments from known proteins
2. Cluster fragments to identify common structural motifs
3. Test library accuracy on proteins not in the initial set
Datasets of protein fragments 200 unique protein domains from Protein Data
Bank (PDB) 36,397 residues
Four sets of backbone fragments 4, 5, 6, and 7-residue long fragments
Divide each protein domain into consecutive fragments beginning at random initial position
f
Fragment structural similarity
Coordinate root-mean-square (cRMS) deviation of Cα atoms
cRMS(A,B) = sqrt(Σdi2/N)
one to one mapping between atoms in structure A and structure B
Translate and rotate to find best alignment 0 if superimpose perfectly
Pruning and clustering
Outliers have large cRMS deviation from all other fragments Discard according to some fragment-length
specific threshold k-means simulated annealing clustering
Repeatedly run k-means clustering, merge nearby clusters and split disperse clusters
Scoring function: total variance = Σ (x – μ)2
Less sensitive to initial choice of cluster centers than k-means
Compiling the libraries
Select cluster centroids as library entries Minimum sum of cRMS deviations from all
the other cluster fragments Form representative set of protein
fragments Library contents highly dependent upon
clustering procedure For each set of fragments, start with 50
random seeds and choose library with minimal total variance score
Evaluating quality of a library
Local-fit How well library fits local
conformation of all proteins in test set.
Global-fit How well library fits global three-
dimensional conformation of all proteins in test set
Local-fit method
Protein structures broken into set of all overlapping fragments of length f
Find for each protein fragment the most similar fragment in the library (cRMS)
Score = Average cRMS value over all fragments in all proteins in the test set
Global-fit method
Concatenate best local-fit library fragments just found
Determine fragment’s orientation by superimposing its first three Cα atoms onto last three Cα atoms of preceding fragment
Global-fit method
Number of possible sequences of fragments exponential in protein’s length
Greedy algorithm finds good rather than best global-fit approximation Start at N terminus, approximate
increasingly larger segments of the protein Concatenate library fragment which will
yield structure of minimal cRMS deviation from corresponding segment
Deterministic, linear time
Global-fit results
100 fragments5 residues
10 states/residue
20 fragments5 residues
4.47 states/residue
0.91 Å 1.85 Å
50 fragments7 residues
2.66 states/residue
2.78 Å
Assembly of sub-structural units
knownstructures
…
fragmentlibrary
proteinsequence
predictedstructure
“Protein structure prediction via combinatorial assembly of sub-structural units”Yuval Inbar, Hadar Benyamini, Ruth Nussinov, and Haim J. Wolfson, 2003
CombDock
Input: structural units (SUs) with known 3D conformations
SUs considered rigid bodies rotated and translated with respect to each
other Goal: predict overall structure Constraints
Penetration: avoid steric clashes Backbone: restriction on maximum distance
between consecutive SUs
All pairs docking N(N-1)/2 pairs of SUs Calculate candidate transformations according
to matching complementary local features on surface of SUs
Apply transformation on 2nd SU of pair Keep K best for each Clustering to ensure all K transformations yield
significantly different complexes
Combinatorial assembly Multigraph representation
Vertices = SUs Edges = transformations between two SUs
K parallel edges between any two vertices
Final protein conformation = spanning tree N SUs, one connectivity component, no cycles
1 2 K…
i
k
j
Transformation betweeni and k induced by
transformations (ij, jk)
Combinatorial Assembly
NN-2KN-1 different spanning trees Not all spanning trees are valid complexes Use heuristical algorithm
Two subtrees adjacent iff there exists an index i so that vertex i is in one subtree and i+1 is in the other
Sequential tree: recursive definition One vertex Tree with edge that connects two adjacent
sequential trees
Combinatorial Assembly
Hierarchical algorithm of N stages ith stage: generate sequential trees with i
vertices Construct trees by connecting adjacent
sequential trees of smaller sizes generated earlier
Keep D best sequential trees at each step Discard trees which do not meet backbone and
penetration constraints Score = sum of scores of transformations
Conclusion
Experimental Methods X-ray crystallography NMR spectroscopy
Computational Methods Homology modeling
Similar sequences fold into similar structures
Threading Dissimilar sequences
may fold into similar structures
Ab initio No similarity
assumptions Conformational search
knownstructures
…
fragmentlibrary
proteinsequence
predictedstructure