Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE...

39
Medical Natural Sciences Year 2: Introduction to Bioinformatics Lecture 9: Multiple sequence alignment (III) Centre for Integrative Bioinformatics VU

Transcript of Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE...

Page 1: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Medical Natural Sciences Year 2:Introduction to Bioinformatics

Lecture 9:Multiple sequence alignment (III)

Centre for Integrative Bioinformatics VU

Page 2: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Intermezzo: Symmetry-derived secondary structure prediction using

multiple sequence alignments (SymSSP)

Victor Simossis Jaap Heringa

Centre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit

Amsterdam, The Netherlands

Page 3: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Symmetry-derived secondary structure prediction using multiple

sequence alignments (SymSSP)• Modern state-of-the-art methods use multiple sequence alignments

•Methods like PhD, Profs, SSPro, etc., predict for the top sequence in the alignment by cutting out positions with gaps in the top sequence

• What if two helices ‘out of phase’ are pasted together? Or a strand and a helix?

• Approach: correct by permuting alignments and consensus prediction

Page 4: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Secondary structure periodicity patterns

Burried β-strand

Edge β-strand

α-helix

hydrophobic hydrophilic

Page 5: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Symmetry-derived Secondary structure prediction using MA (SymSSP)

1234

2134

3124

4123

EEEEE HHHHHH EEEEE HH

EEEE? ?HHHHH EEE H

EEEEE HHHHH? ??EE HH

EEEEEE ?HHHHH EEEE HH

EEEEE HHHHHH EEE HH

EEEE? ?HHHHH EEE H

EEEEE HHHHH? ??EE HH

EEEEE ?HHHHH EEEE HH

EEEEE HHHH EEE HH

EEEE? ?HHH EEE H

EEEEE HHH? ??EE HH

EEEEE HHH? EEEE HH

EEEEE HHHHHH EEE HHHH

EEEE? ?HHHHH EEE ?HHH

EEEEE HHHHH? ??EE HHHH

EEEEE ?HHHHH EEEE HHHH

1111

EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH

Page 6: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Optimal segmentation of predicted secondary structures

H score 0 0 0 0 0….E score 3 4 4 4 3….C score 1 0 0 0 0…..

1234

EEEEE HHHHHH EEEEE HH

EEEE? ?HHHHH EEE H

EEEEE HHHHH? ??EE HH

EEEEEE ?HHHHH EEEE HH

? Score 0 0 0 0 1….Region 0 1 1 1 0….

CEH

Each sequence within an alignment gives riseto a library of n secondary structure predictions, where n is the number of sequences in the alignment.

The predictions are recorded by secondary structure type and region position in a single matrix

1->11->21->31->4

Page 7: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Optimal segmentation of predicted secondary structures by Dynamic Programming

sequence position

window size

Max scoreOffsetLabel

H scoreE scoreC score

The recorded values are used in a weighted function according to their secondary structure type, that gives each position a window-specific score. The more probable the secondary structure element, the higher the score.

Restrictions:H only if ws>=4E only if ws>=2

5H

2 6

Segmentation score (Total score of each path)

? scoreRegion

Page 8: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Example of an optimally segmented secondary structure prediction library for sequence 3chy3chy ---------------GYVV-----KPFTAATLEEKLNKIFEKLGM------3chy <- 1fx1 ??????????????? ee ?? hhhhhhhhhhhhhh ????????3chy <- FLAV_DESDE ??????????????? ee ?? hhhhhhhhhhhhhhh ????????3chy <- FLAV_DESVH ??????????????? ee ?? hhhhhhhhhhhhhh ????????3chy <- FLAV_DESGI ??????????????? eee ?? ??hhhhhhhhhhhhh ????????3chy <- FLAV_DESSA ??????????????? eee ?? ??hhhhhhhhhhhhh ????????3chy <- 4fxn ??????????????? eee ?? hhhhhhhhhhhhh ?????????3chy <- FLAV_MEGEL ????????????????eee ?? hh?hhhhhhhhhhh ?????????3chy <- 2fcr e ? eeeeeee hhhhhhhhhhhhhhh ??????3chy <- FLAV_ANASP ? eeeeeee hhhhhhhhhhhhhhh ??????3chy <- FLAV_ECOLI eeeeeee hhhhhhhhhhhhhhh hhhhh3chy <- FLAV_AZOVI ? eeeeeee hhhhhhhhhhhhhhh ????3chy <- FLAV_ENTAG e eeeeeeee hhhhhhhhhhhhhhhh? ??????3chy <- FLAV_CLOAB eeeeeee hhhhhhhhhh ???????????3chy <- 3chy --------------- ----- hhhhhhhhhhhhhh ------

Consensus ---------------EEEE----- HHHHHHHHHHHHH ------Consensus-DSSP ...............****.....****xx***************......

PHD --------------- ----- HHHHHHHHHHHHHH ------PHD-DSSP ...............xxxx.....******************x**......

DSSP ...............EEEE.....SS HHHHHHHHHHHHHHHT ......LumpDSSP ...............EEEE..... HHHHHHHHHHHHHHH ......

Page 9: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Symmetry-derived secondary structure prediction (SymSSP)

• Tried over 120 different consensus weighting schemes (global, regional, positional)

• Over ~2700 Homstrad alignments and compared to PHD, on average 0.5% better

• 60% of the alignments are improved, 20% not affected and 20% is made worse

• Tried to correlate schemes with “cheap” a priori data (pairwise identities, sequence lengths, number of sequences, etc.)

Page 10: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Integrating secondary structure prediction and multiple sequence

alignment• Low key example shown of fairly

homogeneous data (strings of letters in both cases)

• But already difficult to do and methods are not easily tunable

• How to scale up to knowledge-integrating and inference engines?

Page 11: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Strategies for multiple sequence alignment

• Profile pre-processing• Secondary structure-induced

alignment• Globalised local alignment• Matrix extension

Objective: try to avoid (early) errors

Page 12: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Globalised local alignment

• Aim: fill each DP search matrix with the highest possible local alignment going through that cell

• Problem: Forward calculation + traceback for each local alignment is too slow

• Solution: Double dynamic programming1. Local DP in forward and reverse direction (no

traceback) + matrix summation2. Global DP over matrix from step 1 + traceback

Page 13: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Globalised local alignment

1. Local (SW) alignment (M + Po,e)

+ =

2. Global (NW) alignment (no M or Po,e)

Double dynamic programming

Page 14: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

M = BLOSUM62, Po= 0, Pe= 0

Page 15: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

M = BLOSUM62, Po= 12, Pe= 1

Page 16: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

M = BLOSUM62, Po= 60, Pe= 5

Page 17: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Strategies for multiple sequence alignment

• Profile pre-processing• Secondary structure-induced

alignment• Globalised local alignment• Matrix extension

Objective: try to avoid (early) errors

Page 18: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Integrating alignment methods and alignment information with

T-Coffee• Integrating different pair-wise alignment

techniques (NW, SW, ..)• Combining different multiple alignment

methods (consensus multiple alignment)• Combining sequence alignment methods

with structural alignment techniques• Plug in user knowledge

Page 19: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Matrix extension

T-CoffeeTree-based Consistency Objective Function

For alignmEnt Evaluation

Cedric NotredameDes HigginsJaap Heringa J. Mol. Biol., J. Mol. Biol., 302, 205302, 205--217217;2000;2000

Page 20: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Using different sources of alignment information

Structure alignmentsClustalClustal

Lalign ManualDialign

T-Coffee

Page 21: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Progressive multiple alignment12134

Score 1-2

Score 1-3

Score 4-5

ScoresSimilaritymatrix

5

5×5

Guide tree Multiple alignment

Page 22: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Default T-COFFEE

• Uses information from all sequences for each pair-wise alignment

• Reconciles global and local alignment information

Page 23: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

T-Coffee matrix extension

12

13

14

23

24

34

Page 24: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Search matrix extension

Page 25: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

T-Coffee• Combine different alignment techniques by adding scores:

W(A(x), B(y)) = ∑S(A(x), B(y))

– A(x) is residue x in sequence A– summation is over the scores S of the global and local

alignments containing the residue pair (A(x), B(y))– S is sequence identity percentage of the associated alignment

• Combine direct alignment seqA- seqB with each seqA-seqI-seqB:

W’(A(x), B(y)) = W(A(x), B(y)) + ∑I≠A,BMin(W(A(x), I(z)), W(I(z), B(y)))

– Summation over all third sequences I other than A or B

Page 26: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

T-Coffee

Direct alignment

Other sequences

Page 27: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

T-Coffee library system

Seq1 AA1 Seq2 AA2 Weight

3 V31 5 L33 103 V31 6 L34 14

5 L33 6 R35 215 l33 6 I36 35

Page 28: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

T-Coffee progressive alignment

MDAGSTVILCFVGMDAASTILCGS

Amino Acid Exchange Matrix

Gap penalties (open,extension)

Search matrix

MDAGSTVILCFVG-MDAAST-ILC--GS

Page 29: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Kinase nucleotide binding sites

Page 30: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Comparing T-coffee with other methods

Page 31: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

but.....T-COFFEE (V1.23) multiple sequence alignment Flavodoxin-cheY1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-----FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK-----FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK-----FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK-----4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK-----FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK-----FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL-----2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP-----FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT-----FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL-----FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT-----FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL-----3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV

:. . . : . ::

1fx1 ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESVH ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI--------FLAV_DESGI ---------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV--------FLAV_DESSA ---------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI--------FLAV_DESDE ---------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL--------4fxn ---------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI---------FLAV_MEGEL ---------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA---------FLAV_CLOAB ---------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-----------2fcr ---------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------FLAV_ENTAG ---------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL-------FLAV_ANASP ---------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------FLAV_AZOVI ---------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL----FLAV_ECOLI ---------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA3chy TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM----------------------------------------------------------

.

Page 32: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Evaluating multiple alignmentsEvaluating multiple alignments• Conflicting standards of truth

– evolution– structure– function

• With orphan sequences no additional information• Benchmarks depending on reference alignments• Quality issue of available reference alignment

databases• Different ways to quantify agreement with

reference alignment (sum-of-pairs, column score)• “Charlie Chaplin” problem

Page 33: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Evaluating multiple alignmentsEvaluating multiple alignments

• As a standard of truth, often a reference alignment based on structural superpositioning is taken

Page 34: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Evaluation measuresQuery Reference

Column score

Sum-of-Pairs score

Page 35: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Scoring a multiple alignment

Query

Sum-of-Pairs score:

•For each alignment position: take the sum of all pairs (add a.a. exchange values)

•As an option, subtract gap penalties

Page 36: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Evaluating multiple alignmentsEvaluating multiple alignments

∆SP

BAliBASE alignment nseq * len

Page 37: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Summary

• Weighting schemes simulating simultaneous multiple alignment– Profile pre-processing (global/local)– Matrix extension (well balanced scheme)

• Smoothing alignment signals– globalised local alignment

• Using additional information– secondary structure driven alignment

• Schemes strike balance between speed and sensitivity

Page 38: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

References

• Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364.

• Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217.

• Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477.

Page 39: Medical Natural Sciences Year 2: Introduction to ... · EEEEE HHHHH EEE H EEEE HHHH EE HHH EEEE HHHHH EEE H EEEE HHH EEE HH. Optimal segmentation of predicted secondary structures

Where to find this….http://www.ibivu.cs.vu.nl/teaching