Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Multiple Alignments Motifs/Profiles
description
Transcript of Multiple Alignments Motifs/Profiles
![Page 1: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/1.jpg)
Multiple AlignmentsMotifs/Profiles
• What is multiple alignment?• HOW does one do this?• WHY does one do this?• What do we mean by a motif or
profile?
BIO520 Bioinformatics Jim Lund
Prev. reading: Ch 1-5Assigned reading: Ch 6.4, 6.5, 6.6
![Page 2: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/2.jpg)
Information from Alignments
• Infer biological function– Conserved elements critical for function– Divergent elements relate to divergent
function
• Infer structure (2°, 3°)• Infer phylogeny
– History– Evolutionary forces (selection…)
![Page 3: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/3.jpg)
How do I find similar sequences?
![Page 4: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/4.jpg)
Multiple Alignment
•Global, Optimal
•Theory
•Computation
•Progressive Alignment
![Page 5: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/5.jpg)
Multiple Alignment: better alignments
![Page 6: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/6.jpg)
Alignment Methods/Programs
• GAP (GCG suite)– Optimal Alignment
• MSA– (nearly) Optimal Alignment
• Clustal W/X – Progressive Alignment
• PSI-BLAST– Searches for matching sequences iteratively– Search seq is invariant master for the
alignment.
![Page 7: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/7.jpg)
MSA Strategy
c(A)=c(Ai,j)Minimize score!
• HUGE matrix(aa# of seqs) CRASH computer– time~product of sequence length– 1000x10,000 OK, but 200x200x200x200 NOT
• Alignment procedure– nearly optimal--only considers a subset of all
alignment)– weight sequences via distance– branch-and-bound algorithm
![Page 8: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/8.jpg)
Running MSA
• Download and run it locally (UNIX):– http://www.ncbi.nlm.nih.gov/CBBresearch/S
chaffer/genetic_analysis.html
• On the internet:– http://searchlauncher.bcm.tmc.edu/multi-
align/multi-align.html
• Rerun on segments AFTER Clustal...
![Page 9: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/9.jpg)
Clustal Strategy
1. Rapid pairwise alignments each-to-each
2. Calculate distance matrix– Create guide tree (neighbor joining)
3. Align– Closest pairs first
– Add pairs or align sub-alignments
– Adjust similarity matrix as alignment proceeds
4. Add sequences– introduce gaps
• gaps at loops, not inside known 2° structures
• Dynamic gap weighting
![Page 10: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/10.jpg)
Clustal Strategy
Pairwise alignments Guide tree Align
![Page 11: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/11.jpg)
Clustal W(X) Strategy1. Pairwise alignments
The pairwise alignment number here is a dissimilarity measure.
![Page 12: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/12.jpg)
Clustal W(X) Strategy2. Unrooted neighbor tree
(dendrogram)
![Page 13: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/13.jpg)
Clustal W(X) Strategy3. Guide tree
![Page 14: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/14.jpg)
Clustal W(X) Strategy4. Progressive alignment
using guide tree
![Page 15: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/15.jpg)
Running Clustal W/X• WWW, Win, Mac, UNIX
– http://www2.ebi.ac.uk/clustalw/
• Input– Multiple sequence file (PIR, FASTA,…)
• Can FORCE alignments
• Specify secondary structures
• Considerations– Fast, easy, widely used
– Divergent proteins OK (trees misleading)
![Page 16: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/16.jpg)
“The Right Proteins”GAPDH
Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
*********** :**********.:***.*******************************
![Page 17: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/17.jpg)
“The Right Proteins”GAPDH
Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118
Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110
Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105
:. : :: * : : :*:* :* *. *. :********* ** **:*****::
![Page 18: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/18.jpg)
Alignment Interpretation
• DNA sequences– >50% “worth looking at” (eyeball test)– ~75% needed for phylogeny
• Polypeptide sequences– 80% similar=SAME tertiary structure– 30-80% domains=similar structure– 15-30% ????– <15% short motifs
![Page 19: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/19.jpg)
Uses of Alignment
• Understanding or predicting mutant function
• Finding motifs in DNA or polypeptides
• Directing experiments--e.g. PCR primers
• Phylogeny
![Page 20: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/20.jpg)
“The Right Proteins”
Rabbit KAENGKLVING-KAITIFQERDPANIKWGDAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
Chick KAENGKLVING-HAITIFQERDPSNIKWADAGAEYVVESTGVFTTMEKAGAHLKGGAKRV 117
Human KAEDGKLVIDG-KAITIFQERDPENIKWGDAGTAYVVESTGVFTTMEKAGAHLKGGAKRI 118
Tobacco KVKDEKTLLFGEKSVRVFGIRNPEEIPWAEAGADFVVESTGVFTDKDKAAAHLKGGAKKV 110
Entamoeba EAGENAIIVNGHKIV-VKAERDPAQIGWGALGVDYVVESTGVFTTIPKAEAHIKGGAKKV 105
:. : :: * : : :*:* :* *. *. :********* ** **:*****::
![Page 21: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/21.jpg)
Viewing and interpreting alignments
•Color residues by property•Conservation in the alignment•Known properties
•Substitution groups: STA, HY•Physiochemical property
•charge•hydrophobicity
•Programs for visualization•Jalview•AMAS•Alscript
![Page 22: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/22.jpg)
Viewing alignments
JalView alignment viewer
![Page 23: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/23.jpg)
How to build multiple alignments
1. Find sequences to align (db search).
2. Choose which regions of each protein to include.
• Sequences should be of similar lengths.
3. Run multiple alignment program.
4. Inspect multiple alignment for problems.• Regions with many gaps have aligned poorly.
5. Remove disruptive sequences and re-run alignment.
6. Add back remaining sequences avoiding disruption.
![Page 24: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/24.jpg)
Interpro
• Pfam 7.3 (3865 domains), • PRINTS 33.0 (1650 fingerprints), • PROSITE 17.5 (1565 and 252
preliminary profiles), • ProDom 2001.3 (1346 domains), • SMART 3.1 (509 domains), • TIGRFAMs 1.2 (814 domains), • SWISS-PROT 40.27 (113470 entries), • TrEMBL 21.12 (685610 entries).
![Page 25: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/25.jpg)
InterproA database of protein families, domains
and functional sites
• PROSITE, home of regular expressions and profiles;
• Pfam, SMART, TIGRFAMs, PIRSF, and SUPERFAMILY keepers of hidden Markov models(HMMs);
• PRINTS, provider of fingerprints (groups of aligned, un-weighted motifs);
![Page 26: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/26.jpg)
Interpro
![Page 27: Multiple Alignments Motifs/Profiles](https://reader036.fdocuments.in/reader036/viewer/2022062519/568150be550346895dbede85/html5/thumbnails/27.jpg)
NCBI CDD (Conserved Domain Database
Domains from:• Pfam (Protein families)
– A database of protein families that currently contains > 7973 entries.
• SMART (a Simple Modular Architecture Research Tool)– More than 500 domain families found in signalling,
extracellular and chromatin-associated proteins are detectable.
– Domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues.
• COGs (Clusters of Orthologous Groups)– Proteins or groups of paralogs from at least 3 lineages that
correspond to an ancient conserved domain