©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of ©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few...
©CMBI 2005
Why align sequences?
Lots of sequences with unknown structure and function. A few sequences with known structure and function
If they align, they are similar
If they are similar, then they might have same structure or function
If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works
©CMBI 2005
Sequence Alignment
The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein
gap = insertion or deletion
A
B
B
A
©CMBI 2005
Alignment
To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need an sequence alignment that represents the protein structures today, a structural alignment.
©CMBI 2005
Alignment
The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the “same” position in the 3D structures of the corresponding proteins!!
Two very simple examples:
1) the 3 active site residues of the serine protease we saw earlier2) Cys-bridges:
STCTKGALKLPVCRKTSCTEG--RLPGCKR
©CMBI 2005
Things one can do with a good alignment
Carry information from a well studied to a less well studied protein.
Such information can be:
Phosphorylation sitesGlycosylation sitesStabilizing mutationsMembrane anchorsIon binding sitesLigand binding residuesCellular localization
©CMBI 2005
Significance of alignment
One can only transfer information if the similarity is significantly high between the two sequences.
Schneider (group of Sander) determined the “threshold curve” for transfering structural information from one known protein structure to another protein sequence:
If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.
If the sequences are smaller in length, a higher percentage of identity is needed.
Structure is much more conserved than sequence!
©CMBI 2005
Aligning sequences by hand
Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids.
Examples: which is the better alignment (left or right)?
1) CPISRTWASIFRCW CPISRTWASIFRCWCPISRT---LFRCW CPISRTL---FRCW
2) CPISRTRASEFRCW CPISRTRASEFRCWCPISRTK---FRCW CPISRT---KFRCW
©CMBI 2005
Aligning sequences by hand (2)
Procedure of aligning depends on information available:
1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do.
2) Also use explicitly the secondary structure preference of the amino acids.
3) Use 3D information if one or more of the structures in the alignment are known.
In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.
©CMBI 2005
Helix preferences
-4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H
ALA 143 148 99 58 189 205 187 241 268 1538
CYS 24 31 29 22 14 17 18 33 17 205
ASP 98 110 121 260 98 197 167 49 86 1186
GLU 91 100 71 71 152 287 269 70 147 1258
PHE 53 70 90 29 68 46 49 107 65 577
GLY 207 246 166 192 96 127 99 65 60 1258
HIS 48 50 39 46 28 36 38 24 30 339
ILE 94 81 133 19 79 45 68 161 99 779
LYS 99 98 80 46 98 105 69 80 154 829
LEU 105 111 188 50 140 84 113 281 209 1281
MET 37 20 51 13 26 22 54 61 67 351
ASN 103 83 89 206 46 62 55 37 77 758
PRO 143 136 121 99 240 78 40 0 0 857
GLN 48 58 40 38 83 93 124 76 101 661
ARG 82 63 59 51 71 75 61 114 109 685
SER 112 128 98 292 105 126 99 48 76 1084
THR 106 99 119 253 91 80 115 72 67 1002
VAL 141 107 132 37 117 74 120 208 120 1056
TRP 29 25 29 14 30 26 28 30 29 240
TYR 66 65 75 33 58 44 56 72 48 517
©CMBI 2005
Helix preferences and alignment
1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5
©CMBI 2005
Helix preferences and alignment
1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5
©CMBI 2005
Helix preferences and alignment S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5Final alignment:
S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G
©CMBI 2005
A ‘real’ example of threading
If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside:
Where does the Arg in structure 2 go?
(and what will CLUSTAL choose?)
1
2
©CMBI 2005
An even more real example
1 2 3 4 5 6 7 8 9 10 11 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALVAL CYS ARG THR PRO --- --- --- GLU ALA ILEVAL CYS ARG --- --- --- THR PRO GLU ALA ILE
©CMBI 2005
An even more real example
1 2 3 4 5 6 7 8 9 10 11 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALVAL CYS ARG THR PRO --- --- --- GLU ALA ILEVAL CYS ARG --- --- --- THR PRO GLU ALA ILE
VVV CCC
RRR
LT-
PP- G- -
S-T
A-P
EEE
AAAI I I
©CMBI 2005
Multiple sequence alignment
Multiple sequence alignments can confirm or improve pair-wise sequence alignments:
CWPVAASYGR CWPVAASYGRCWPT---YGRCWPTA-SYGR CWPTA-SYGRCWPTLGLFGR
?
©CMBI 2005
Multiple sequence alignment
Multiple sequence alignments can reveal structural information:
ASCTRGCIKLPTCKKMGRCTGYSTCTKGALKLPVCRKMGKSSAYATSTHGCMKLPCSRRFGKCSSYTSCTEGCLRLPGCKRFGRCTSYTTCTKGLLKLPGCKRFGKSSAYASSTKGCMKLPVSRRFGRCTAY
©CMBI 2005
Multiple sequence alignment
Multiple sequence alignments can validate PROSITE search results.In N-{P}-[ST]-{P} the N is the glycosylation site.The chance of finding N-{P}-[ST]-{P} is rather high.So how can you be sure? Look at the multiple sequence alignment:
ASLRNASTVVTIGDTITGNLTLASYHWGSIKNGSSVITLPGTMEGNLSTTTYHYATLRNASTVMEINGTITGDLTLASFHW