. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Protein Structure Prediction

[Based on Structural Bioinformatics, section VII]

Predicting protein 3d structure

Goal: 3d structure from 1d sequence

What kind of fold the given sequence may

adopt?

Fold recognition

Comparative modeling

ab-initio

An existing fold

A new fold

Measuring progress

CASP – Critical Assessment of Structure Prediction

CAFASP – Critical Assessment of Fully Automated Structure Prediction

Targets: unpublished NMR or X-ray structuresGoal: predict target 3d structure and submit it

for independent and comparative review

What Forces Hold the Structure?

Hydrogen Bonds

• Charge-charge interactions• Positive charged groups prefer to be

situated against negatively charged groups

• Hydrophobic effect

Disulfide bonds S-S bonds between

Cysteine residues

Homology modeling

Based on the two major observations:

1. The structure of a protein is uniquely defined by its amino acid sequence.

2. Similar sequences adopt practically identical structures, distantly related sequences still fold into similar structures.

Growth of the Protein Data Bank

Fraction of New Folds

[Rost, Protein Eng. 1999]

Two zones of sequence alignment

The 7 steps to homology modeling

1. Template recognition and initial alignment― BLAST, FASTA

2. Alignment correction― Better alignment, MSA

3. Backbone generation― Copy backbone atoms [and side-chains

of conserved residues]

4. Loop modeling― Knowledge based― Energy based

5. Side-chain modeling― Rotamer: a low energy

side-chain conformation― Rotamer library [backbone

independent, dependent]― HUGE search space [~5N]

High accuracy for residues in the hydrophobic core [90%], much lower for residues in the surface [50%]

6. Model optimization― Predict the side-chains, then the resulting

shifts in the backbone, then the rotamers for the new backbone …

7. Model validation― Calculating the model’s energy― Determination of normality indices:

― bond lengths, bond and torsion angles― Inside/outside distribution of polar residues― Radial distribution function

adopt?

Fold recognition

ab-initio

An existing fold

A new fold

Fold recognition

Which of the known folds is likely to be similar to the (unknown) fold of a new protein when only its amino-acid sequence is known?

Fraction of new folds (PDB new entries in 1998)

Koppensteiner et al., 2000,Koppensteiner et al., 2000,JMB 296:1139-1152.JMB 296:1139-1152.

Unrelated proteins adopt similar folds

Only 100 folds account for ~50% of all protein superfamilies

Possible explanations:1. Divergent evolution2. Convergent evolution3. Limited number of folds4. Misguided analysis

Proteins as seen by a Biologist

Does a new protein sequence belong to a given family of proteins (with a specific set of mutation rules)?

Fold recognition is based on:• Sequence alignment, multiple sequence

alignment• Profile HMM, PSI-BLAST

Proteins as seen by a Physicist

“Thermodynamic hypothesis”: The native conformation of a protein corresponds to a global free energy minimum of the system (protein + solvent)

Naïve approach: having a correct energy function, search for the native structure in the conformational space

Threading

Threading: energy based fold recognition

Define:1. Protein model and interaction description2. Alignment algorithm3. Energy parameterization

CCEEabab A C D E …..

A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .

E Eji, positions

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

1) ... 56) ... n)

-10 ... -123 ... 20.5

Find best fold for a protein sequence:

Fold recognition (threading)

GenTHREADER(Jones , 1999, JMB 287:797-815)

For each template provide MSA align the query sequence with the MSA assess the alignment by sequence

alignment score assess the alignment by pairwise

potentials assess the alignment by solvation function record lengths of: alignment, query,

template

Essentials of GenTHREADER

adopt?

Fold recognition

ab-initio

An existing fold

A new fold

Ab-initio folding

Goal: Predict structure from “first principles”

Requires: A free energy function, sufficiently close to

the “true potential” A method for searching the conformational

Benefits: Works for novel folds Shows that we understand the process

Ab-initio folding – the challenge

1. Current potential functions have limited accuracy

2. The conformational space is HUGE

Possible simplifications: Reduced representation Simplified potentials Coarse search strategies

Representation

Detailed representation – include all atoms of the protein and the surrounding solvent computational expansive

• Implicit solvent models• United atom representation• Side-chain as centroid or cα

• Restricted side-chain configurations (rotamers)

• Restricted backbone torsion angles

Rosetta[Simons et al. 1997]

• “Structural” signatures are reoccurring within protein structures

• Use these as cues during structure search

I-sites Library – a catalog of local sequence-structure correlations

Serine hairpin Type-I hairpin Frayed helix

Fragment insertion Monte Carlo

Energyfunctionchange

backbone angles

Convert to 3D

accept or reject

Choose a fragment

tsbackbone torsion angles

Rosetta: a folding simulation program

evaluate

Potential functions

• Molecular mechanics – models the forces that determines protein conformation

• Van der Waals: Lennard-Jones 12-6• Electrostatic: Coulomb’s law

• Scoring functions – empirically derived from solved structures

• Useful with reduced complexity models• Useful in treating aspects of protein

thermodynamics

Search methods

• Molecular dynamics – Simulates the motion of a molecule in a given potential

• Impractical …

• Coarse sampling of energy landscape:• Simulated annealing, genetic algorithms,

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Documents

Transcript of . Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS

Gene structure prediction - Bioinformatics · gene structure prediction • A number of methods exists for gene structure prediction which integrate diﬀerent techniques to detect

Bioinformatics Structural and functional prediction Master in Molecular Biotecnology 2009-10.

CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation

CISC 467/667 Intro to Bioinformatics (Spring 2007) Protein Structure Prediction

An Introduction to Bioinformatics Algorithms Gene Prediction: Statistical Approaches.

BIOINFORMATICS LAB Episode VII Differential Expression ......BIOINFORMATICS LAB Episode VII –Differential Expression Analysis Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

Epitope prediction algorithms Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.

Protein Structure Bioinformatics IntroductionIntroduction to Protein Structure Bioinformatics 29.9.2004 Lorenza Bordoli 8 Secondary Structure prediction Assumption: ¾there should

Bioinformatics of Disease: immune epitope prediction Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences.

Bioinformatics of Disease: immune epitope prediction

CS5263 Bioinformatics RNA Secondary Structure Prediction.

Bioinformatics Master Course II: DNA/Protein structure-function analysis and prediction Lecture 12: DNA/RNA structure Centre for Integrative Bioinformatics.

RNA and Protein Structure Prediction Bioinformatics ...lopresti/Courses/2007-08/CSE308... · RNA and Protein Structure Prediction. CSE 308-408 · Bioinformatics: Issues and Algorithms

BST 226 Statistical Methods for Bioinformatics David M. Rockedmrocke.ucdavis.edu/Class/BST226.2014.Winter/Prediction and Classification.pdfStatistical Methods for Bioinformatics .

Protein structure prediction: The holy grail of bioinformatics

Bioinformatics The Prediction of Life

Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions.

Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms RNA: Secondary Structure Prediction and Analysis.