©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few...

©CMBI 2005

Why align sequences?

Lots of sequences with unknown structure and function. A few sequences with known structure and function

If they align, they are similar

If they are similar, then they might have same structure or function

If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works

©CMBI 2005

Sequence Alignment

The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein

gap = insertion or deletion

A

B

B

A

©CMBI 2005

Alignment

To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need an sequence alignment that represents the protein structures today, a structural alignment.

©CMBI 2005

Alignment

The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the “same” position in the 3D structures of the corresponding proteins!!

Two very simple examples:

1) the 3 active site residues of the serine protease we saw earlier2) Cys-bridges:

STCTKGALKLPVCRKTSCTEG--RLPGCKR

©CMBI 2005

Things one can do with a good alignment

Carry information from a well studied to a less well studied protein.

Such information can be:

Phosphorylation sitesGlycosylation sitesStabilizing mutationsMembrane anchorsIon binding sitesLigand binding residuesCellular localization

©CMBI 2005

Significance of alignment

One can only transfer information if the similarity is significantly high between the two sequences.

Schneider (group of Sander) determined the “threshold curve” for transfering structural information from one known protein structure to another protein sequence:

If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

If the sequences are smaller in length, a higher percentage of identity is needed.

Structure is much more conserved than sequence!

©CMBI 2005

Significance of alignment (2)

©CMBI 2005

Aligning sequences by hand

Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids.

Examples: which is the better alignment (left or right)?

1) CPISRTWASIFRCW CPISRTWASIFRCWCPISRT---LFRCW CPISRTL---FRCW

2) CPISRTRASEFRCW CPISRTRASEFRCWCPISRTK---FRCW CPISRT---KFRCW

©CMBI 2005

Aligning sequences by hand (2)

Procedure of aligning depends on information available:

1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do.

2) Also use explicitly the secondary structure preference of the amino acids.

3) Use 3D information if one or more of the structures in the alignment are known.

In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.

©CMBI 2005

Helix

©CMBI 2005

Helix preferences

-4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H

ALA 143 148 99 58 189 205 187 241 268 1538

CYS 24 31 29 22 14 17 18 33 17 205

ASP 98 110 121 260 98 197 167 49 86 1186

GLU 91 100 71 71 152 287 269 70 147 1258

PHE 53 70 90 29 68 46 49 107 65 577

GLY 207 246 166 192 96 127 99 65 60 1258

HIS 48 50 39 46 28 36 38 24 30 339

ILE 94 81 133 19 79 45 68 161 99 779

LYS 99 98 80 46 98 105 69 80 154 829

LEU 105 111 188 50 140 84 113 281 209 1281

MET 37 20 51 13 26 22 54 61 67 351

ASN 103 83 89 206 46 62 55 37 77 758

PRO 143 136 121 99 240 78 40 0 0 857

GLN 48 58 40 38 83 93 124 76 101 661

ARG 82 63 59 51 71 75 61 114 109 685

SER 112 128 98 292 105 126 99 48 76 1084

THR 106 99 119 253 91 80 115 72 67 1002

VAL 141 107 132 37 117 74 120 208 120 1056

TRP 29 25 29 14 30 26 28 30 29 240

TYR 66 65 75 33 58 44 56 72 48 517

©CMBI 2005

Helix preferences and alignment

1) S G V S P D Q L A A L K L I L E L A L K 2) G T S L E T A L L M Q I A Q K L I A G S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5

©CMBI 2005

Helix preferences and alignment S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5Final alignment:

S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

©CMBI 2005

A ‘real’ example of threading

If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside:

Where does the Arg in structure 2 go?

(and what will CLUSTAL choose?)

1

2

©CMBI 2005

An even more real example

1 2 3 4 5 6 7 8 9 10 11 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALVAL CYS ARG THR PRO --- --- --- GLU ALA ILEVAL CYS ARG --- --- --- THR PRO GLU ALA ILE

©CMBI 2005

An even more real example

1 2 3 4 5 6 7 8 9 10 11 ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALVAL CYS ARG THR PRO --- --- --- GLU ALA ILEVAL CYS ARG --- --- --- THR PRO GLU ALA ILE

VVV CCC

RRR

LT-

PP- G- -

S-T

A-P

EEE

AAAI I I

©CMBI 2005

Multiple sequence alignment

Multiple sequence alignments can confirm or improve pair-wise sequence alignments:

CWPVAASYGR CWPVAASYGRCWPT---YGRCWPTA-SYGR CWPTA-SYGRCWPTLGLFGR

?

©CMBI 2005


Multiple sequence alignments can reveal structural information:

ASCTRGCIKLPTCKKMGRCTGYSTCTKGALKLPVCRKMGKSSAYATSTHGCMKLPCSRRFGKCSSYTSCTEGCLRLPGCKRFGRCTSYTTCTKGLLKLPGCKRFGKSSAYASSTKGCMKLPVSRRFGRCTAY

©CMBI 2005


Multiple sequence alignments can validate PROSITE search results.In N-{P}-[ST]-{P} the N is the glycosylation site.The chance of finding N-{P}-[ST]-{P} is rather high.So how can you be sure? Look at the multiple sequence alignment:

ASLRNASTVVTIGDTITGNLTLASYHWGSIKNGSSVITLPGTMEGNLSTTTYHYATLRNASTVMEINGTITGDLTLASFHW

©CMBI 2005

Summary

Bioinformatics is all about obtaining information. Everything you can find in a database saves you doing experiments.

Sequence alignment is important for carrying over information between ‘similar proteins’.

To align sequences, you need to understand the amino acids.

©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few...

Documents

Transcript of ©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few...