Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Protein Structure Prediction

Samantha ChuiOct. 26, 2004

Central Dogma of Biology

Question: Given a protein sequence, to what conformation will it fold?

DNA sequence Protein sequence Protein structure

transcription& translation

folding

How does nature do it?

Hydrophobicity vs. hydrophilicity

Van der Waals interaction

Electrostatic interaction

Hydrogen bonds Disulfide bonds

Current Approaches

Experimental Methods X-ray crystallography NMR spectroscopy

Computational Methods Homology modeling

Similar sequences fold into similar structures Threading

Dissimilar sequences may fold into similar structures

Ab initio No similarity assumptions Conformational search

Assembly of sub-structural units

knownstructures

…

fragmentlibrary

proteinsequence

predictedstructure

“Small Libraries of Protein Fragments Model Native Protein Structures Accurately”Rachel Kolodny, Patrice Koehl, Leonidas Guibas, and Michael Levitt, 2002

Goal: Find finite set of protein fragments that can be used to construct accurate discrete conformations for any protein

1. Generate fragments from known proteins

2. Cluster fragments to identify common structural motifs

3. Test library accuracy on proteins not in the initial set

Datasets of protein fragments 200 unique protein domains from Protein Data

Bank (PDB) 36,397 residues

Four sets of backbone fragments 4, 5, 6, and 7-residue long fragments

Divide each protein domain into consecutive fragments beginning at random initial position

f

Fragment structural similarity

Coordinate root-mean-square (cRMS) deviation of Cα atoms

cRMS(A,B) = sqrt(Σdi2/N)

one to one mapping between atoms in structure A and structure B

Translate and rotate to find best alignment 0 if superimpose perfectly

Pruning and clustering

Outliers have large cRMS deviation from all other fragments Discard according to some fragment-length

specific threshold k-means simulated annealing clustering

Repeatedly run k-means clustering, merge nearby clusters and split disperse clusters

Scoring function: total variance = Σ (x – μ)2

Less sensitive to initial choice of cluster centers than k-means

Compiling the libraries

Select cluster centroids as library entries Minimum sum of cRMS deviations from all

the other cluster fragments Form representative set of protein

fragments Library contents highly dependent upon

clustering procedure For each set of fragments, start with 50

random seeds and choose library with minimal total variance score

Evaluating quality of a library

Local-fit How well library fits local

conformation of all proteins in test set.

Global-fit How well library fits global three-

dimensional conformation of all proteins in test set

Local-fit method

Protein structures broken into set of all overlapping fragments of length f

Find for each protein fragment the most similar fragment in the library (cRMS)

Score = Average cRMS value over all fragments in all proteins in the test set

Local-fit results

Global-fit method

Concatenate best local-fit library fragments just found

Determine fragment’s orientation by superimposing its first three Cα atoms onto last three Cα atoms of preceding fragment

Global-fit method

Number of possible sequences of fragments exponential in protein’s length

Greedy algorithm finds good rather than best global-fit approximation Start at N terminus, approximate

increasingly larger segments of the protein Concatenate library fragment which will

yield structure of minimal cRMS deviation from corresponding segment

Deterministic, linear time

Global-fit results

100 fragments5 residues

10 states/residue


4.47 states/residue

0.91 Å 1.85 Å


2.66 states/residue

2.78 Å

Assembly of sub-structural units

knownstructures

…

fragmentlibrary

proteinsequence

predictedstructure

“Protein structure prediction via combinatorial assembly of sub-structural units”Yuval Inbar, Hadar Benyamini, Ruth Nussinov, and Haim J. Wolfson, 2003

CombDock

Input: structural units (SUs) with known 3D conformations

SUs considered rigid bodies rotated and translated with respect to each

other Goal: predict overall structure Constraints

Penetration: avoid steric clashes Backbone: restriction on maximum distance

between consecutive SUs

All pairs docking N(N-1)/2 pairs of SUs Calculate candidate transformations according

to matching complementary local features on surface of SUs

Apply transformation on 2nd SU of pair Keep K best for each Clustering to ensure all K transformations yield

significantly different complexes

Combinatorial assembly Multigraph representation

Vertices = SUs Edges = transformations between two SUs

K parallel edges between any two vertices

Final protein conformation = spanning tree N SUs, one connectivity component, no cycles

1 2 K…

i

k

j

Transformation betweeni and k induced by

transformations (ij, jk)

Combinatorial Assembly

NN-2KN-1 different spanning trees Not all spanning trees are valid complexes Use heuristical algorithm

Two subtrees adjacent iff there exists an index i so that vertex i is in one subtree and i+1 is in the other

Sequential tree: recursive definition One vertex Tree with edge that connects two adjacent

sequential trees


Hierarchical algorithm of N stages ith stage: generate sequential trees with i

vertices Construct trees by connecting adjacent

sequential trees of smaller sizes generated earlier

Keep D best sequential trees at each step Discard trees which do not meet backbone and

penetration constraints Score = sum of scores of transformations

CombDock Results

Conclusion

Experimental Methods X-ray crystallography NMR spectroscopy

Computational Methods Homology modeling

Similar sequences fold into similar structures

Threading Dissimilar sequences

may fold into similar structures

Ab initio No similarity

assumptions Conformational search

knownstructures

…

fragmentlibrary

proteinsequence

predictedstructure

References

Kolodny et al., “Small libraries of protein fragments model protein structures accurately”

Inbar et al., “Protein structure prediction via combinatorial assembly of sub-structural units”

Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Documents

Transcript of Protein Structure Prediction Samantha Chui Oct. 26, 2004.