Prediction-based fingerprints of protein-protein interactions
Protein Prediction
-
Upload
vanidarling -
Category
Documents
-
view
231 -
download
0
Transcript of Protein Prediction
-
7/30/2019 Protein Prediction
1/100
Protein Secondary Structure PredictionPSSP
-
7/30/2019 Protein Prediction
2/100
Proteins
Protein: from the Greek word PROTEUO which means "tobe first (in rank or influence)"
Why are proteins important to us:
Proteins make up about 15% of the mass of the average person
and maintain the structural integrity of the cell.
Enzyme acts as a biological catalyst
Storage and transportHaemoglobin
Antibodies
Hormones Insulin
-
7/30/2019 Protein Prediction
3/100
Introduction to proteins
Peptide
Bond
-
7/30/2019 Protein Prediction
4/100
Four levels of protein
structure
-
7/30/2019 Protein Prediction
5/100
Conformational Parameters of secondary structure of
a protein
Dihedral Angles/torsion angles/rotation angles
=phi= N-C
=psi= C-C
= omega=C-N
-
7/30/2019 Protein Prediction
6/100
These values can be calculated?
Ramachandran Plot: Founded by G.N.Ramachandran.
Green region indicates the
stericially permitted & values except Gly and Pro.
Yellow circles represent the
conformational angles of several
secondary structures..-helix,parallel & anti parallel -sheet
-
7/30/2019 Protein Prediction
7/100
Helices
-
7/30/2019 Protein Prediction
8/100
Helices
H: - helix
G: 310 helix
I: - helix (extremely rare)
-
7/30/2019 Protein Prediction
9/100
Hydrogen bondingi-i+3th i-i+4th i-i+5th
-
7/30/2019 Protein Prediction
10/100
Secondary Structure
8 different categories (DSSP): H: - helix
G: 310 helix
I: - helix (extremely rare)
E: - strand
B: - bridge
T: - turn
S: bend
L: the rest
(pitch 5.4 A0)
1.5 A0
-
7/30/2019 Protein Prediction
11/100
Three secondary structure states
Prediction methods are normally trained andassessed for only 3 states (residues):
H (helix), E (strands) and L (coil)
There are many published 8-to-3 states reduction
methods Standard reduction methods are defined by
programsDSSP (Dictionary of SS of Proteins),STRIDE, and DEFINE
Improvement of predictive accuracy of different SSP(Secondary Structure Prediction) programs depends
on the choice of the reduction method
-
7/30/2019 Protein Prediction
12/100
Protein Secondary Structure Prediction
Techniques for the prediction of protein secondarystructure provide information that is useful both in
ab initio structure prediction and
as an additional constraint for fold-recognition
algorithms. Knowledge of secondary structure alone can help the
design of site-directed or deletion mutants that will
not destroy the native protein structure.
For all these applications it is essential that thesecondary structure prediction be accurate, or at
least that, the reliability for each residue can be
assessed.
-
7/30/2019 Protein Prediction
13/100
Protein Secondary Structure Prediction If a protein sequence shows clear similarity to a
protein of known three dimensional structure, then
the most accurate method of predicting the
secondary structure is to align the sequences by
standard dynamic programming algorithms, as the
homology modelling is much more accurate thansecondary structure prediction for high levels of
sequence identity.
Secondary structure prediction methods are of most
use when sequence similarity to a protein of knownstructure is undetectable.
It is important that there is no detectable sequence
similarity between sequences used to train and test
secondary structure prediction methods.
-
7/30/2019 Protein Prediction
14/100
Protein Secondary Structure
Secondary Structure
Regular
Secondary
Structure
( -helices, -
sheets)
Irregular
Secondary
Structure
(Tight turns,
Random coils,
bulges)
-
7/30/2019 Protein Prediction
15/100
PSSP Algorithms
There are three generations in PSSP
algorithms
Early/First Generation: based onstatistical/rule basedinformation of single
aminoacids
Second Generation: based on windows(segments) of aminoacids. Typically a window
containes 11-21 aminoacids
Third Generation: based on the use of
windows on evolutionary information
-
7/30/2019 Protein Prediction
16/100
PSSP: First Generation
First generation PSSP systems are based on
statistical information on a single aminoacid
The most relevant algorithms:
Chow-Fasman, 1974 (Statistics based)
GOR, 1978 (Rule based)
Both algorithms claimed 74-78% of predictive
accuracy, but tested with better constructed datasetswere proved to have the predictive accuracy ~50%
(Nishikawa, 1983)
-
7/30/2019 Protein Prediction
17/100
PSSP: Second Generation
Based on the information contained in awindow of aminoacids (11-21 aa.)
The most systems use algorithms based on: Statistical information
Physico-chemical properties
Sequence patterns
Multi-layered neural networks
Graph-theory Multivariante statistics
Expert rules
Nearest-neighbour algorithms
No Bayesian networks
-
7/30/2019 Protein Prediction
18/100
PSSP: Second Generation
Main problems:
Prediction accuracy
-
7/30/2019 Protein Prediction
19/100
PSSP: Third Generation
PHD: First algorithm in this generation (1994) Evolutionary information improves the prediction accuracy to
72%
Use of evolutionary information:
1. Scan a database with known sequences with alignmentmethods for finding similar sequences
2. Filter the previous list with a threshold to identify themost significant sequences
3. Build aminoacid exchangeprofiles based on theprobable homologs (most significant sequences)
4. The profiles are used in the prediction, i.e. in buildingthe classifier
-
7/30/2019 Protein Prediction
20/100
PSSP: Third Generation
Many of the second generation algorithms have been
updated tothird generation The most important algorithms of today
Predator: Nearest-neighbour
PSI-Pred: Neural networks
SSPro: Neural networks
SAM-T02: Homologs (Hidden Markov Models)
PHD: Neural networks
Due to the improvement of protein information indatabases i.e. better evolutionary information, todays
predictive accuracy is ~80%
It is believed that maximum reachable accuracy is
88%
-
7/30/2019 Protein Prediction
21/100
First Generation PSSP
Two classical methods that use previously
determined propensities:
Chou-Fasman
Garnier-Osguthorpe-Robson
-
7/30/2019 Protein Prediction
22/100
Chou-Fasman method
Uses table of conformational parameters
(propensities) determined primarily frommeasurements of secondary structure.
Frequency of amino acid X observed in element Y
Frequency of element Y in database
-
7/30/2019 Protein Prediction
23/100
-
7/30/2019 Protein Prediction
24/100
-
7/30/2019 Protein Prediction
25/100
Designations:H = Strong Former,
h = Former,
I = Weak Former,
i = Indifferent,
B = Strong Breaker,
b = Breaker;
P = Conformational
Parameter
-
7/30/2019 Protein Prediction
26/100
The Chou-Fasman method
If you were asked to determine whether an amino
acid in a protein of interest is part of a -helix or -
sheet, you might think to look in a protein database
and see which secondary structures amino acids insimilar contexts belonged to.
The Chou-Fasman method (1974) is a combination of
such statistics-based methods and rule-basedmethods.
-
7/30/2019 Protein Prediction
27/100
Steps of the Chou-Fasman algorithm:
1. Calculate propensities from a set of solvedstructures. For all 20 amino acids i,calculate these
propensities by:
)(
)/(
)(
)/(
)(
)/(
iP
TurniP
iP
BetaiP
iP
HelixiP
Propensities > 1 mean that the residue type I is likely to be found in theCorresponding secondary structure type.
-
7/30/2019 Protein Prediction
28/100
Amino Acid -Helix -Sheet TurnAla 1.29 0.90 0.78
Cys 1.11 0.74 0.80
Leu 1.30 1.02 0.59
Met 1.47 0.97 0.39
Glu 1.44 0.75 1.00
Gln 1.27 0.80 0.97
His 1.22 1.08 0.69Lys 1.23 0.77 0.96
Val 0.91 1.49 0.47
Ile 0.97 1.45 0.51
Phe 1.07 1.32 0.58
Tyr 0.72 1.25 1.05
Trp 0.99 1.14 0.75
Thr 0.82 1.21 1.03
Gly 0.56 0.92 1.64
Ser 0.82 0.95 1.33
Asp 1.04 0.72 1.41
Asn 0.90 0.76 1.23
Pro 0.52 0.64 1.91
Arg 0.96 0.99 0.88
Chou and Fasman
Favors
-Helix
Favors
-strand
Favors
turn
-
7/30/2019 Protein Prediction
29/100
Chou and Fasman
Predicting helices:- find nucleation site: 4 out of 6 contiguous residues with P()>1
- extension: extend helix in both directions until a set of 4 contiguous
residues has an average P() < 1 (breaker)
- if average P() over whole region is >1, it is predicted to be helical
Predicting strands:
- find nucleation site: 3 out of 5 contiguous residues with P()>1
- extension: extend strand in both directions until a set of 4 contiguousresidues has an average P() < 1 (breaker)
- if average P() over whole region is >1, it is predicted to be a strand
-
7/30/2019 Protein Prediction
30/100
2. Once the propensities are calculated, each aminoacid is categorized using the propensities as one of:
Each amino acid is also categorized as one of:
helix-former,
helix-breaker, or
helix-indifferent.
(That is, helix-formers have high helical propensities, helix-breakershave low helical propensities, and helix-indifferent haveintermediate propensities.)
sheet-former,
sheet-breaker, or
sheet-indifferent.
For example, it was found (as expected) that glycine and prolines
are helix-breakers.
-
7/30/2019 Protein Prediction
31/100
3.Find nucleation sites.
These are short subsequences with a high-concentration ofhelix-formers (or sheet-formers).
These sites are found with some heuristic rule (e.g. a sequenceof 6 amino acids with at least 4 helix-formers, and no helix-breakers").
4. Extend the nucleation sites, adding residues at theends, maintaining an average propensity greater thansome threshold.
5. Step 4 may create overlaps; Finally, we deal withthese overlaps using some heuristic rules.
Th GOR th d
-
7/30/2019 Protein Prediction
32/100
The GOR method(Garnier, Osguthorpe, Robson)
Position-dependent propensities for helix, sheet or turn is calculated for each amino
acid. For each position j in the sequence, eight residues on either side areconsidered.
A helix propensity table contains information about propensity for residues at 17
positions when the conformation of residue j is helical. The helix propensity tables
have 20 x 17 entries.
Build similar tables for strands and turns.
GOR simplification:
The predicted state of AAj is calculated as the sum of the position-dependent
propensities of all residues around AAj.
GOR can be used at : http://abs.cit.nih.gov/gor/(current version is GOR IV)
j
http://abs.cit.nih.gov/gor/http://abs.cit.nih.gov/gor/ -
7/30/2019 Protein Prediction
33/100
Suppose aj is the amino acid that we are trying tocategorize.
GOR looks at the residues
Intuitively, it assigns a structure based on probabilities it has
calculated from protein databases. These probabilities are of the
form
-
7/30/2019 Protein Prediction
34/100
Accuracy
Both Chou and Fasman and GOR have been
assessed and their accuracy is estimated to
be Q3=60-65%.
(initially, higher scores were reported, but theexperiments set to measure Q3 were flawed,
as the test cases included proteins used toderive the propensities!)
-
7/30/2019 Protein Prediction
35/100
Nearest Neighbour Method
This method depends on the spatial structure of
central residues in each window having kowledge of
its neighbours.
This methods is hence called as Memory/Homology
based method.
Steps
-
7/30/2019 Protein Prediction
36/100
Steps Computes MS Algorithm
Computes the distance between homologous
sections
Example : NNSSP
Training Data
Test data
New data
NN Tree(Binary search tree)
Validation
Prediction
Actual NN
tree with data
Modelling scheme of the nerest-neighbour method
SDV Hyperplanes
DS manipulation
-
7/30/2019 Protein Prediction
37/100
Neural Networks
Single sequence methods - train network using sets
of known proteins of certain types (all alpha, all beta,
alpha+beta) then use to predict for query sequence
NNPREDICT (>70% accuracy)
-
7/30/2019 Protein Prediction
38/100
NEURAL NETWORKS
-
7/30/2019 Protein Prediction
39/100
Inspired by the brain
Traditional computers struggle to
recognize and generalize patterns
of the past for future actions
Brain as an information processing
system contains 10 billion nerve
cells or neurons and each neuron isconnected to other neuron through
about 10,000 synapses
-
7/30/2019 Protein Prediction
40/100
Brain
Interconnected networkof that
collect, process anddisseminate electricalsignals via
Neurons
Synapses
Neural Network
Interconnected network of
units (or nodes) that
collect, process anddisseminate values via
links
Nodes
Links
-
7/30/2019 Protein Prediction
41/100
Scheme for modeling Neural Network:
Building a random network Training the net on a training set
-
7/30/2019 Protein Prediction
42/100
Building a random network
random selection of the type of node
random selection of the parameters of the node
random selection of the number of the inputs
connecting the inputs and outputs until the netis larger
running the training set over the net
selecting the proper outputremoval of all nodes which do not contribute to
the output
-
7/30/2019 Protein Prediction
43/100
Training the network
The general idea behind this is to run the net on the
training set, and every time it gives a right answer
leave it as it is and or strengthen the path. Every time
it makes a mistake penalize it. This can be done inseveral ways to incrementally modify the net:
alter the parameters
add/delete connections
add/delete the nodes with their connections
-
7/30/2019 Protein Prediction
44/100
T i l th d l d t t i f d f d
-
7/30/2019 Protein Prediction
45/100
Typical methodology used to train a feed-forwardnetwork for secondary structure prediction is based onQian and Sejnowski, 1988
Alanine: 100000000
Helix : 100000000
-
7/30/2019 Protein Prediction
46/100
Different Network Topologies
Single layer feed-forward networks
Input layer projecting into the output layer
Input Output
layer layer
Single layer
network
-
7/30/2019 Protein Prediction
47/100
Different Network Topologies
Multi-layer feed-forward networks
One or more hidden layers. Input projects only
from previous layers onto a layer.
Input Hidden Output
layer layer layer
2-layer or
1-hidden layer
fully connected
network
-
7/30/2019 Protein Prediction
48/100
Different Network Topologies
Recurrent networks
A network with feedback, where some of its
inputs are connected to some of its outputs(discrete time).
Input Output
layer layer
Recurrent
network
-
7/30/2019 Protein Prediction
49/100
Features
A typical training set consists of 100 non-homologous protein chains (15,000 training patterns)
A net with an input window of 17, five hidden nodes
in a single hidden layer and three outputs will have
357 input nodes and 1,808 weights.
Predictions are made on a winner-takes-all basis
-
7/30/2019 Protein Prediction
50/100
PHD (Profile network from HeiDelberg)
A program with several cascading neural networks. The method employs a two-layered feed-forward
neural network
Sequence to structure (First layer)
Structure to Structure (Second layer)
> 70%
Window length is 13, and for every position in the window
frequencies for 20 aa is calculated (20x13)
Based on this the OUTPUT is a probability of the three
possible classes (H,E and C)
-
7/30/2019 Protein Prediction
51/100
Evaluating Prediction Efficiency
Jackknief test:
Percentage of correctly classified residues:
Correlation coefficient for each target class:
-
7/30/2019 Protein Prediction
52/100
Prediction of Transmembrane helices
Two Ways:
One is solely based on the construction principles of
proteins associated with physico-chemical properties
of amino acids. No concept of training is involved.
The other is to collect data sets with known
structures, extract features and use machine learning
algorithms for predictions.
-
7/30/2019 Protein Prediction
53/100
Outline
1. Importance of Transmembrane Proteins
2. General Topologies
3. Methods (and challenges) for Structural
Studies of TM Proteins
-
7/30/2019 Protein Prediction
54/100
Eukaryotic cells have many membranes
-
7/30/2019 Protein Prediction
55/100
Transmembrane Proteins
v Cellular roles include:Communication between cells
Communications between organelles and cytosolIon transport, Nutrient transport
Links to extracellular matrixReceptors for viruses
Connections for cytoskeletonv Over 25% of proteins in complete genomes.
v Key roles in diabetes, hypertension, depression,arthritis, cancer, and many other common diseases.v Targets for over 75% of pharmaceuticals.
-
7/30/2019 Protein Prediction
56/100
Transmembrane Proteins
v Cellular roles include:Communication between cells
Communications between organelles and cytosolIon transport, Nutrient transportLinks to extracellular matrix
Receptors for virusesConnections for cytoskeleton
v Over 25% of proteins in complete genomes.v Key roles in diabetes, hypertension, depression,
arthritis, cancer, and many other common diseases.v Targets for over 75% of pharmaceuticals.
However, very few TM protein structures have been solved!
Bi l i l M b Li id Bil
-
7/30/2019 Protein Prediction
57/100
Biological Membrane = Lipid Bilayer
Approximately 30 thickHydrophobic core + Hydrophilic or charged headgroups
Mixture of lipids that vary in type of head groups, lengths of acyl chains,number of double bonds
(Some membranes also contain cholesterol)
M b Bil ith P t i
-
7/30/2019 Protein Prediction
58/100
Membrane Bilayer with Proteins
In order to be stable in this environment, a polypeptide chain needs to
(1) contain a lot of amino acids with hydrophobic sidechains, and
(2) fold up to satisfy backbone H-bond propensity - How?
-
7/30/2019 Protein Prediction
59/100
Structure Solution #1: Hydrophobic alpha-
helix
Satisfies polypeptide
backbone hydrogen
bonding Hydrophobic
sidechains face
outward into lipids
-
7/30/2019 Protein Prediction
60/100
Examples of Helix Bundle TM Proteins
PDB = 1QHJ PDB = 1RRC
Single helix or helical bundles (> 90% of TM proteins)Examples: Human growth hormone receptor, Insulin receptor
ATP binding cassette family - CFTRMultidrug resistance proteins
7TM receptors - G protein-linked receptors
-
7/30/2019 Protein Prediction
61/100
Structure solution #2
Beta-barrel
Beta sheet satisfies
backbone hydrogen
bonds between
strands
Wrap sheet around
into barrel shape
Sidechains on theoutside of the barrel
are hydrophobic
E l f B t B l TM P t i
-
7/30/2019 Protein Prediction
62/100
Examples of Beta Barrel TM Proteins
PDB = 1EK9 PDB = 2POR
Beta barrels - in outer membrane of gram negative bacteria,and some nonconstitutive membrane acting toxins
Examples: Porins
General Topologies of TM Proteins
-
7/30/2019 Protein Prediction
63/100
General Topologies of TM Proteins
Single helix or helical bundles and Beta barrelsBoth topologies result in
hydrophobic surfaces facing acyl chains of lipidsPart protruding from membrane can be a very short sequence (a few
amino acids), a loop, or large, independently folding domains
-
7/30/2019 Protein Prediction
64/100
Presence of Hydrophobic TM
Domain can result in:Low levels of expressionDifficulties in solubilization
Difficulties in crystallization
Attempting crystallization and structure solution of
transmembrane proteins is considered difficult and
risky.
Diffi lt d i k b t till ibl
-
7/30/2019 Protein Prediction
65/100
Difficult and risky, but still possible:TM Proteins of Known Structure
Great summary and resource:http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html
Bacteriorhodopsin, RhodopsinPhotosynthetic reaction centers
PorinsLight harvesting complexes
Potassium channelsChloride channelsAquaporin
TransportersEtc.
**Although few in number, each of these structureshave been important for addressing key functions.***
-
7/30/2019 Protein Prediction
66/100
Protein Folding Problem
How does a one-dimensional amino
acid sequence determine a specific
three-dimensional structure?
Or
How can we read the sequence andpredict that structure?
-
7/30/2019 Protein Prediction
67/100
General Idea
We know what an alpha-helix or a beta strand
looks like, so
(1) figure out which parts of the sequence arehelices and which parts are strands
(2) figure out how they pack together
For soluble proteins, neither is well predicted.
But for transmembrane proteins ...
TM P t i St t P di ti St #1
-
7/30/2019 Protein Prediction
68/100
TM Protein Structure Prediction, Step #1For alpha-helical transmembrane proteins, hydropathy plot analysisprovides a fairly accurate method to predict which amino acids form
membrane-spanning helices
We can model the structure of anindividual alpha helix fairlyaccurately.
-
7/30/2019 Protein Prediction
69/100
TM Protein Structure Prediction, Step #2
How do the helices pack in the membrane?There are several labs studying known protein structures to identify factorsinvolved in determining how transmembrane helices pack together(specificity of interaction and packing motifs)
Hydrogen bonds
HydrophobiciityAmino acids known to face the lumen of a channelMultiple sequence alignments
Helix packing sequence motifs, etc.These kinds of information are then combined with protein docking and
energy minimization programs to predict how the helices pack together.
It is quite possible that studies of helical transmembrane proteins couldlead to key information about the protein folding problem - how to predictprotein structure from amino acid sequence
-
7/30/2019 Protein Prediction
70/100
collect data sets with known
Build Model
Align using clustralW/hmmalign
Hmmbuild
Hmmsearch
Calculate threshold
Score distribution and Training data (NN)Validation
Prediction using Machine learning techniques
-
7/30/2019 Protein Prediction
71/100
Summary Transmembrane Proteins play many important processes in
cellular processes in both health and disease
Two general type of tertiary structure are found to cross themembranes: beta-barrels and alpha-helices
Structural Studies of TM Proteins are impeded by difficulties in
overexpression, purification and crystallization However, the few dozen structures that have been determined
have provided key information about channels (gating,selectivity, etc.), energetics, transport, and othertransmembrane processes
Analysis of helical transmembrane protein structures may leadto accurate predictions of protein structure from amino acidsequence for this type of protein
-
7/30/2019 Protein Prediction
72/100
Prediction of protein conformations
from protein sequences
(3D Prediction)
Protein Conformations
-
7/30/2019 Protein Prediction
73/100
73
Predict protein 3D structure from (amino acid) sequence
Sequence secondary structure 3D structure
function
-
7/30/2019 Protein Prediction
74/100
74
-
7/30/2019 Protein Prediction
75/100
75
Protein 3D Structure Detection
X-ray Crys
NMR
Expensive
Slow
-
7/30/2019 Protein Prediction
76/100
76
Protein Structure
Protein 3D structure biological function
Lock & key model of enzyme function (docking)
Folding problem
protein sequence 3D structure
Structure prediction and alignment
Protein design, drug design, etc
The holy grailof bioinformatics
-
7/30/2019 Protein Prediction
77/100
77
The Prediction Problem
Can we predict the final 3D protein structure knowing
only its amino acid sequence?
Studied for 4 Decades
Primary Motivation for Bioinformatics
Based on this 1-to-1 Mapping of Sequence to
Structure
Still very much an OPEN PROBLEM
-
7/30/2019 Protein Prediction
78/100
78
-
7/30/2019 Protein Prediction
79/100
79
Predicting Protein Structure
Goal Find best fit of sequence to 3D structure
Comparative (homology) modeling
Construct 3D model from alignment to protein sequences
with known structure Threading (fold recognition)
Pick best fit to sequences of known 2D / 3D structures (folds)
Ab initio / de novo methods
Attempt to calculate 3D structure from scratch Molecular dynamics
Energy minimization
Lattice models
-
7/30/2019 Protein Prediction
80/100
80
PSP: Goals
Accurate 3D structures. But not there yet.
Good guesses
Working models for researchers
Understand the FOLDING PROCESS
Get into the Black Box
Only hope for some proteins
25% wont crystallize, too big for NMR Best hope for novel protein engineering
Drug design, etc.
-
7/30/2019 Protein Prediction
81/100
81
PSP: Major Hurdles
Energetics We dont know all the forces involved in detail
Too computationally expensive BY FAR!
Conformational search impossibly large
100 a.a. protein, 2 moving dihedrals, 2 possible positions for
each diheral: 2200
conformations!
Levinthals Paradox
Longer than time of universe to search
Proteins fold in a couple of seconds??
Multiple-minima problem
-
7/30/2019 Protein Prediction
82/100
82
Tertiary Structure Prediction
Major Techniques
Comparative Modeling
Homology Modeling
Threading
Template-Free Modeling
De novo/ab initioMethods
Physics-Based
Knowledge-Based
Homology Modeling
-
7/30/2019 Protein Prediction
83/100
83
Homology Modeling
-
7/30/2019 Protein Prediction
84/100
84
Steps
Template selection
Target template alignment
Model building Evaluation
Repeated until a satisfactory
model structure is achieved
-
7/30/2019 Protein Prediction
85/100
85
Threading
a library of protein folds (templates)
a scoring function to measure the fitness of a
sequence -> structure alignment
a search technique for finding the best alignmentbetween a fixed sequence and structure
a means of choosing the best fold from among the
best scoring alignments of a sequence to all possible
folds
-
7/30/2019 Protein Prediction
86/100
86
ab initioMethods
The ab initio approach (Figure 6.25) ignores
-
7/30/2019 Protein Prediction
87/100
87
pp ( g ) g
sequence homology and attempts to predict the
folded state from fundamental energetics orphysicochemical properties associated with the
constituent residues. This involves modelling
physicochemical parameters in terms of forcefields that direct the folding. These constraints will
reflect the energetics associated with charge,
hydrophobicity and polarity with the aim being tofind a single structure oflow energy.
How to define the energy of a PROTEIN?
-
7/30/2019 Protein Prediction
88/100
88
How to define the energy of a PROTEIN?
How to find the conformations for which the energy is
minimum?
This approach is based on the thermodynamic argument that
the native structure of a protein is the global minimum in
the free energy profile.
Generally the results are expressed as rmsd (root mean
square deviations), reflecting the difference in positions
between corresponding atoms in the experimental and
calculated (predicted) structures.
-
7/30/2019 Protein Prediction
89/100
89
What make global minimum?
1) Semi imperical potential function
Which is calculated as the sum of all the possible pair
wise interactions between the atoms in the molecule
Eg: AMBER force field
Evaluates the N(N-1)/2 atom-atom interaction which
requires computational time N square.
Most of the computational methods look for the
Global minimum??
SEMI IMPERICAL POTENTIAL FUNCTION
-
7/30/2019 Protein Prediction
90/100
90
structure structure
E
N
ER
G
Y
E
N
E
R
G
Y
SCHEMATIC DIAGRAM OF DIFFERENT TYPES OF
ENERGY FUNCTIONS
Global Minimum Global Minimum
N(N-1)/2 Atom-atom Interaction
Eg : AMBER Force Fields
-
7/30/2019 Protein Prediction
91/100
To over come this reduce the resolution at which the
potential function is calculated.
Instead of atom-atom potential The United atompotential would be an approximation.
This approximation is also called as a Pseudo atom
-
7/30/2019 Protein Prediction
92/100
92
2) UNRES force field (Residues with Solvent)
UNRES was originally designed and parameterizedto locate native-like structures of proteins as thelowest in potential energy by unrestricted globaloptimization.
Propensities of amino acids calculated.
the backbone is represented as a sequence of -carbon (C) atoms linked by virtual bonds designatedas dC, with united peptide groups (ps) in theircenters.
-
7/30/2019 Protein Prediction
93/100
93
Molecular Dynamics
-
7/30/2019 Protein Prediction
94/100
94
Computation of dynamics or motion of a ptn.
1) The physical forces which influence the folding
process are well represented by the semi empirical
force fields.
2) The atoms of the ptn move independently uponinduction of the force fields..
Calculated by Newtons laws of motion:-
F=ma (calculated for each fematoseconds)..
-
7/30/2019 Protein Prediction
95/100
Motion influenced by temperature??
-
7/30/2019 Protein Prediction
96/100
96
Stimulated annealing:-
Higher temperatures the motion will be greater in ashorter period of time.
1) Initial global minimum of the atom is calculated first.
2) Temperature is raised to 3000K for a fewFemtoseconds and then gradually lowered to roomtemperature 300K
The idea is to even if you start to predict the protein inwith wrong structure (at 3000K) final corrections willlead to conformation.
Molecular dynamics Optimization tool
-
7/30/2019 Protein Prediction
97/100
97
energy
Conformational space
Trajectory
Oth h i l ti
-
7/30/2019 Protein Prediction
98/100
98
Other such simulations are ..
MONTE CARLO SIMULATIONS
CONFORMATIONAL SPACE ANNEALING
ROSETTA ALGORITHM
CASP (Critical assessment of structure
prediction)
LEVINTHAL PARODOX etc
ROSETTA ALGORITHM
-
7/30/2019 Protein Prediction
99/100
Break target sequence into fragments of 9 amino acids
Create profile , X, for target
Create profile, S, for similar PDB sequences
Align profiles X, S to get best match fragment
Use fragments as starting point for optimisation, using:
- hydrophobic burial
- polar side-chain interactions
- hydrogen bonding between beta-strands
- hard sphere repulsion (van der Waals)
Create 1000 structures, and
Choose cluster centre as the
best prediction
-
7/30/2019 Protein Prediction
100/100