“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on...
-
Upload
rosalyn-perry -
Category
Documents
-
view
214 -
download
1
Transcript of “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on...
![Page 1: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/1.jpg)
“Homology-enhanced probabilistic consistency” multiple sequence alignment :
a case study on transmembrane protein
Jia-Ming Chang
2013-July-09
Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.
![Page 2: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/2.jpg)
Transmembrane proteinMembrane proteins are likely to constitute 20-30% of all ORFs contained in genomes.
Odorant receptors
Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.
![Page 3: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/3.jpg)
Transmembrane protein multiple sequence alignment
• 1994 first address alignment for transmembrane proteins
– Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy
for transmembrane proteins. J Mol Biol 1994, 243(3):388-396.
• Few multiple sequence alignment software till now => 3
– Shafrir Y, Guy HR: STAM: simple transmembrane alignment
method. Bioinformatics 2004, 20(5):758-769.
– Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling
and sequence alignment methods applied to membrane
proteins. Biophys J 2006, 91(2):508-517.
– Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for
improved multiple alignment of transmembrane proteins.
Bioinformatics 2008, 24(4):492-497.
![Page 4: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/4.jpg)
BAliBASE 2.0 reference 7
Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.
![Page 5: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/5.jpg)
We need an accurate Transmembrane MSA!
![Page 6: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/6.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
![Page 7: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/7.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
![Page 8: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/8.jpg)
Pair-hidden Markov Model
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15(2):330-340.
Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.
![Page 9: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/9.jpg)
Probabilistic consistency transformation
![Page 10: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/10.jpg)
Homology-extended probabilistic consistency
New emission probabilities are like the following.
20 20
)..,..(),('m n
nmnmji AAAApyxp
where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.
![Page 11: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/11.jpg)
Homology-extended probabilistic consistency
where αi , βj , and rk are the profile frequency.
![Page 12: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/12.jpg)
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
Que1: how to build a profile?
Que2: how to score profiles?
![Page 13: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/13.jpg)
Que1: how to build a profile?• Database Size
• Searching parameters
– E-value : most used, anything else???
1. Matrix file : -M2. Filter the query sequence for low-complexity subsequence : -F3. Neighborhood word threshold : -f4. Truncates the report to number of alignments: -b
![Page 14: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/14.jpg)
Word hit & Neighborhood
![Page 15: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/15.jpg)
Searching parameters
• Fast, Insensitive search
– High percent identity
– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5
• Slow, Sensitive search
– Increase sensitivity, decrease specificity
– blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v
10000
• Book “BLAST”, page 146, 147
![Page 16: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/16.jpg)
UniRef50
TM
UniRef90
TM
UniRef100
TM
UniProtTM
Different database
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50
UniRef90
UniRef100
keyword:"Transmembrane [KW-0812]"
![Page 17: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/17.jpg)
Database SizeData Set No.
UniRef50-TM 87,989
UniRef90-TM 263,306
UniRef100-TM 613,015
UniProt-TM 818,635
UniRef50 3,077,464
UniRef90 6,544,144
UniRef100 9,865,668
UniProt 11,009,767
NCBI NR 10,565,004UniRef5
0TM
UniRef90
TM
UniRef100
TM
UniProtTM
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50
UniRef90
UniRef100
keyword:"Transmembrane [KW-0812]"
![Page 18: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/18.jpg)
Performance comparison of different database sizes for the BAliBASE2-
ref7.
UniRef50-TM contains about 100 times fewer sequences than the full UniProt.
The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.
![Page 19: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/19.jpg)
![Page 20: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/20.jpg)
10% more columns are correctly aligned when compared with PRALINETM .
The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.
![Page 21: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/21.jpg)
BAliBASE 3.0
The performance of other methods are from Rausch et al. The SP and TC scores of full-length sequences are evaluated by core blocks (by xml).
![Page 22: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/22.jpg)
Que2: how to score profiles?
Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.
![Page 23: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/23.jpg)
• Prediction mode : –template_file PSITM
• Output : -output tm_html
This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species. Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.
![Page 24: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,](https://reader030.fdocuments.in/reader030/viewer/2022032802/56649de55503460f94add55f/html5/thumbnails/24.jpg)
Paolo Di Tommaso
http://tcoffee.crg.cat/tmcoffee