Proteins Secondary Structure Predictions
-
Upload
fredericka-nolan -
Category
Documents
-
view
33 -
download
6
description
Transcript of Proteins Secondary Structure Predictions
![Page 1: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/1.jpg)
Proteins SecondaryStructure Predictions
Structural Bioinformatics
![Page 2: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/2.jpg)
2
Structure Prediction Motivation
• Better understand protein function
• Broaden homology– Detect similar function where sequence differs
(only ~50% remote homologies can be detected based on sequence)
• Explain disease– Explain the effect of mutations – Design drugs
![Page 3: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/3.jpg)
3
“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
Myoglobin – the first high resolution protein structure
![Page 4: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/4.jpg)
4
Predicting the three dimensional structure from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high precision the secondary structure
MERFGYTRAANCEAP….
![Page 5: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/5.jpg)
What do we mean by Secondary Structure ?
Secondary structure are the building blocks of the protein structure:
=
![Page 6: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/6.jpg)
6
What do we mean by Secondary Structure ?
Secondary structure is usually divided into three categories:
Alpha helix Beta strand (sheet)Anything else –
turn/loop
![Page 7: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/7.jpg)
7
3.6 residues
5.6 Å
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
• 3.6 amino acids per turn.
• Stabilized by H-bonds
![Page 8: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/8.jpg)
8
Beta Strand: Pauling and Corey (1951)
• Different polypeptide chains run alongside each
other and are linked together by hydrogen bonds.
• Each section is called β -strand,
and consists of 5-10 amino acids.
β -strand
![Page 9: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/9.jpg)
9
The strands become adjacent to each other, forming beta-sheet.
Beta SheetBeta Sheet3.47Å
4.6Å
3.25Å
4.6Å
Antiparallel
Parallel
![Page 10: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/10.jpg)
10
Loops
• Connect the secondary structure elements.
• Have various length and shapes.
• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.
![Page 11: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/11.jpg)
11
Three dimensional Tertiary Structure
Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the
level of one whole polypeptide chain
![Page 12: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/12.jpg)
12
RBP
Globin
Tertiary
Secondary
![Page 13: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/13.jpg)
13
How do the (secondary and tertiary) structures relate to the primary
protein sequence??
![Page 14: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/14.jpg)
14
-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)
- Protein structure is more conserved than
protein sequence and more closely related
to function.
STRUCTURESEQUENCE
![Page 15: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/15.jpg)
15
How (CAN) Different Amino Acid Sequence Determine Similar Protein
Structure ??
Lesk and Chothia 1980
![Page 16: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/16.jpg)
16
The Globin Family
![Page 17: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/17.jpg)
17
Different sequences can result in similar structures
1ecd 2hhd
![Page 18: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/18.jpg)
18
We can learn about the important features which determine structure and function by comparing the sequences and structures ?
![Page 19: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/19.jpg)
19
The Globin Family
![Page 20: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/20.jpg)
20
Why is Proline 36 conserved in all the globin family ?
![Page 21: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/21.jpg)
21
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
![Page 22: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/22.jpg)
22
How are remote homologs related in terms of their structure?
retinol-binding protein
odorant-binding protein
apolipoprotein D b-lactoglobulin
RBD
![Page 23: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/23.jpg)
23
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
![Page 24: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/24.jpg)
24
The Retinol Binding Protein b-lactoglobulin
![Page 25: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/25.jpg)
Structure Prediction: Motivation
• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)
• Only about ~50000 solved protein structures• Experimental methods are time consuming and not
always possible
• Goal: Predict protein structure based on sequence information
![Page 26: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/26.jpg)
26
Prediction Approaches
• Tow stage
1. Primary (sequence) to secondary structure
2. Secondary to tertiary
• One stage
- Primary to tertiary structure
![Page 27: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/27.jpg)
27
According to the most simplified model: • In a first step, the secondary structure is
predicted based on the sequence. • The secondary structure elements are then
arranged to produce the tertiary structure, i.e. the structure of a protein chain.
• For molecules which are composed of different subunits, the protein chains are arranged to form the quaternary structure.
![Page 28: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/28.jpg)
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt ?
28
![Page 29: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/29.jpg)
29
Secondary Structure Prediction Methods
• Chou-Fasman / GOR Method– Based on amino acid frequencies
• Machine learning methods– PHDsec and PSIpred
• HMM (Hidden Markov Model)
![Page 30: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/30.jpg)
30
Chou and Fasman (1974)Name P(a) P(b) P(turn)
Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
![Page 31: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/31.jpg)
31
Secondary Structure Method Improvements
‘Sliding window’ approach• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold
predict this is an alpha helix/beta sheet
TGTAGPOLKCHIQWMLPLKK
![Page 32: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/32.jpg)
32
Improvements since 1980’s
• Adding information from conservation in MSA
• Smarter algorithms (e.g. Machine learning, HMM).
Success -> 75%-80%
![Page 33: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/33.jpg)
33
Machine learning approach for predicting Secondary Structure (PHD, PSIpred)
Step 1: Generating a multiple sequence alignment
Query
SwissProt
QuerySubjectSubjectSubjectSubject
![Page 34: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/34.jpg)
34
Step 2:Additional sequences are added using a profile. We end up with a MSA which represents the protein family.
Query
seed
QuerySubjectSubjectSubjectSubject
MSA
![Page 35: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/35.jpg)
35
The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure.
Query
seed
QuerySubjectSubjectSubjectSubject
MSA Machine LearningApproach Known
structures
Step 3:
![Page 36: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/36.jpg)
36
• HMM enables us to calculate the probability of assigning a sequence to a secondary structure
TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB
p? =
HMM approach for predicting Secondary Structure (SAM)
![Page 37: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/37.jpg)
37
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
The probability of
observing Alanine as part of a β-
sheet
Table built according to large database of known secondary structures
α-helix followed by
α-helix
Beginning with an α-
helix
![Page 38: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/38.jpg)
38
• The above table enables us to calculate the probability of assigning secondary structure to a protein
• Example
TGQHHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995
![Page 39: Proteins Secondary Structure Predictions](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813572550346895d9cd67d/html5/thumbnails/39.jpg)
39
Secondary structure prediction
• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University
of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,
EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks
at University of California • DLP - Domain linker prediction at RIKEN