Representations of Molecular Structure: Bonds Only.

Post on 30-Dec-2015

213 views 0 download

Tags:

Transcript of Representations of Molecular Structure: Bonds Only.

Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Bonds Only

Representations of Molecular Structure: Atoms Only

Representations of Molecular Structure: Atoms and Bonds

Representations of Molecular Structure: Ribbons

Representations of Molecular Structure: Mixed

Representations of Molecular Structure: van der Waals Surface

Representations of Molecular Structure: Solvent Excluded Surface

Protein Structure Prediction

Protein folding is differentfrom structure prediction

• Folding is concerned with the process of taking the 3D shape, usually based on physical principles.

• Prediction uses any statistical, theoretical or empirical data to try to get at the end result.

Protein Structure Prediction

• A bit of history: Asilomar, 1994, 1996, 1998, 2000, 2002, & 2004 (pending)

• Three approaches to structure prediction:

a. Homology modeling

b. Sequence-structure threading

c. Ab initio prediction

Asilomar

• Experimentalists who had structures that would be solved before date of CASP meeting submitted the sequences of the unknowns to a central repository.

• Predictors could download sequence and minimal information about protein (name), and could enter one of three categories.

• Assessors use automatic programs for analysis in addition to expertise to evaluate quality of predictions.

CASP6 in Numbers

• Number of human expert groups registered 228 Number of prediction servers registered 65

• Number of targets released 87Targets canceled 11 Valid targets 76 Targets for human expert prediction 76 Targets for server prediction 76

CASP6: Accepted Predictions

Prediction format No. groups No. 1 Models All Models

3D coordinates 166 8276 27472

Alignments to PDB 37 1726 5250

Residue-residue contacts 16 983 1664

Domains assignments 24 1230 1546

Disordered regions 20 1365 1695

Function prediction 25 1033 1179

All 228 (unique) 14613 38806

Asilomar Categories

• Homology Modeling (sequences with high homology to sequences of known structures)

Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence.

Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown.

Asilomar Categories

• Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure.

Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one the known folds.

Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions).

Fold Recognition

• New sequence:

MLDTNMKTQLKAYLEKLTKPVELIATL

DDSAKSAEIKELL…• Library of known folds:

Asilomar Categories

• Ab initio prediction (no known homology with any sequence of known structure)

Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles.

Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure.

Ab initio prediction

• New sequence:MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKSAEIKELL…

• Predict secondary structure:MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKSAEIKELL…HHHHHCCCCCHHHHHHHHHHCCCCBBBBBBBCCBBBB…

• Predict 3D structure entirely:

Asilomar Results

How to evaluate predictions?• RMSD• Overall identification and topology of

secondary structures• Energy considerations (contacts, H-

bonds)• Similarity of hydrophobic core• Sequence alignment quality (and

systematic shift)

Homology Modeling

• When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD).

• Sophisticated energy minimization techniques do not dramatically improve upon initial guess.

• Rigorous criteria applied such as torsion angles, van der Waals violations, RMSD.

Homology Modeling Samples Thick backbone shows known structure. Thin lines show modeled

structures. Some sidechains are not positioned correctly, but backbone and other sidechains look quite good.

Homology Modeling Mistakes

• a. Sidechain mistakes• b. Shifts with correct

alignment • c. No template • d. Misalignment• e. Incorrect template

Limitations of Homology Modeling

Useful Conclusions from CASP

• Use of sensitive multiple alignment techniques helped get best alignments.

• Side chain modeling uses libraries of known amino acid conformations. Success ranged from 45% to 80% correct (= angles within 30° of experimental structure).

• Energy based refinement still not improving the structures.

Ab Initio Predictions – From Primary to Secondary

• Range of accuracy from 66% to 77% (3 state labeling: helix, coil or beta).

• Human hand editing improves the accuracy.

• Multiple sequence alignments improve the performance of secondary structure prediction.

Ab Initio Predictions –From Secondary to Tertiary

• Sensitive to errors in secondary structure

• Predictors were more likely to predict previously known structures.

Ab Initio Predictions –From Primary to Tertiary

• Predict interresidue contacts and then compute structure (mild success)

• Simplified energy term + reduced search space (phi/psi or lattice) (moderate success)

• Creative ways to memorize sequence <-> structure correlations in short segments from the PDB, and use these to model new structures. database method. (moderate success)

Ab Initio Predictions –Tertiary (1 to 3): Good Methods

• Associate sequence of unknown with known 3D structure library, and then optimizing contact frequency of amino acids, as measured in PDB (Baker et al).

• Generate all folds on lattice and then filter the bad ones out (Samudrala et al)

• Combine multiple sequence alignment, secondary structure prediction and lattice. (Skolnick et al)

Lattice Model: Overcoming Entropic Barriers

Substructure/Fragment Model: Overcoming Entropic Barriers

• Break target into fragments of 9 amino acids

• Search for similar PDB sequences based on sequence similarity

• Start with extended chain, and evaluate the effect of introducing the fragments into the chain.

Substructure/Fragment Model: Overcoming Entropic Barriers

• Use Metropolis-type algorithm for optimization, using following terms:

– hydrophobic burial

– polar side-chain interactions

– hydrogen bonding between beta-strands

– hard sphere repulsion (van der Waals)• Create 1000 structures, cluster them.• Choose one representative from each cluster as

possible prediction…

Successful Stories of Rosseta

Successful Stories of Rosseta

Fold Recognition Becoming More Important

• CASP1: Of 21 target proteins, 11 wound up having folds that were previously known.

• CASP2: Of 22 targets, 15 with available folds

• CASP3: Of 43 targets, 36 with available folds

• …

Fold Recognition

• Every predictor does well on something.

• Common folds (more examples) are easier to recognize.

• Fold recognition was the surprise performer at the first competition. Incremental progress at second, third, fourth …

Fold Recognition

• Not “all or none”. List of top N hits much better than top hit.

• Common folds easier to recognize.

• Quality of alignments that result is NOT good.

• Potentials include: residue pair contact terms, hydrophobicity, polarity, H-bonds, local structure terms.

1 = target, 2 = Fold in PDB

1 = target, 2 = Fold in PDB

Elements of a fold recognitionalgorithm

• Library of protein structures, suitably processed- All structures- Representative subset- Structures with loops removed• Scoring function- contact potential- environmental evaluation function• Method for generating initial alignments and/or

searching for better alignments.

Scoring: Contact Potential

• Instead of modeling energies from first physical principles, simplify the problem by positioning only amino acids, and compute empirical energies from the observed associations of amino acids.

• “GLU is attracted to LYS” = E(glu, lys)

Scoring: Contact Potential

• Create energy terms between amino acids:

E(interaction) = -KT ln[frequency of interaction]

• Frequency of interaction is measured in database of known structures. Higher frequency, more favorable interaction.

Sippl Contact Potential

Given:a = amino acid type a (ALA, VAL, etc...)b = amino acid type bs = separation in sequence

Δ Eabs(r) = Eabs (r) — Es(r)

Energy of interaction between a and b minus average energy at that separation equals the energy difference that contributes to stability.

Sippl Contact Potential

Thus we have:

ΔEabs(r) = -KT ln [ fabs (r) / fs (r) ]

• For any given sequence in 3D, compute distances between all pairs of amino acids (usually up to r = 10-15Å), and sum.

• ΔEtot = Σ ΔEabs(r) all a,b pairs

Using Contact Potential

• Given 3D structure, need to mount the sequence on the structure.

– dynamic programming (okay)– exhaustive enumeration (too expensive)(recent paper shows that this is NP-hard)– heuristic enumeration—limit on gap lengths, loop lengths

(heuristic)• Evaluate the contact potential for the alignment.• [Optional] Locally optimize the potential score.• Compare potential with random shuffle of sequence, and

with other sequences to approximate z-score.

Future of Structure Predictions

• Protein fold recognition will get asymptotically better, as we get more folds.

• Best ab initio methods use knowledge of database, and will thus also improve.

• Estimates are that we now have between 30% and 50% of folds that occur.

• Given fold, we need to improve refinement with homology modeling techniques.