Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS
description
Transcript of Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS
![Page 1: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/1.jpg)
Elements of Bioinformatics (14F001)
TP2: Gene prediction22 October 2012
CORRECTIONS
![Page 2: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/2.jpg)
Notice:
During this practical, you will need to use ‘raw’ and ‘fasta’ sequence formats.
For additional information on the different sequence formats available, please have a look athttp://www.genomatix.de/online_help/help/sequence_formats.html
![Page 3: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/3.jpg)
nc RNA gene prediction
![Page 4: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/4.jpg)
Choose: eukaryotic tRNA; does not give any result with general tRNA model !
![Page 5: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/5.jpg)
![Page 6: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/6.jpg)
![Page 7: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/7.jpg)
CpG island prediction
![Page 8: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/8.jpg)
CpG island in the C. Elegans cosmid
Lenght 219 pb; position 21’954 to 22’172
cgttttctgtggtcaca cacgagtatc cggatcttct ggatcaactt gttctcgtct gcaacgtctt tgcaagaatg gcaccagaac agaaacaact actcgtggaa caccttcaag acgttgggca gacggtcgct atgtgtggcg atggagctaa tgattgtgct gctctgaaag cagctcacgc gggaatctca ctatcggagg ctgaagcatc ga
To confirm that this sequence could be part of a promoter sequence (> 80 % of CpG islands extend in the 5’ flanking region of the associated genes), check - according to its positions - if this CpG island is located in a gene promoter region(see later).
![Page 9: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/9.jpg)
Gene prediction
with HMM on the complete cosmid sequence
![Page 10: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/10.jpg)
Gene 1
Gene 2
Gene 3
Gene 4
Wrong CDS ?
3 HMM models: firstex, exon_n, lastex
![Page 11: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/11.jpg)
1
4
32
tRNA 169 238
Predicted CpG island: 21954 22172 -> in the middle of CDS4: not a ‘classical’ CpG (not in the 5’ of a gene)
Summary:
![Page 12: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/12.jpg)
Gene 1
![Page 13: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/13.jpg)
Gene 1 prediction with HMMgene
One gene found
![Page 14: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/14.jpg)
Gene 1 prediction with HMMgene
With ‘human’: 2 genes found, one on each strand, (strand minus with less good scores)The programs are ‘trained’ with sequence from specific organisms. The ‘codon bias’ for example, is not the same for the different species.
![Page 15: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/15.jpg)
Example of codon usage tables (-> codon bias)http://www.kazusa.or.jp/codon/
![Page 16: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/16.jpg)
Gene 1 prediction with Netgene2
Netgene 2 gives the positions of the first and last nucleotide of the intron (donnor and acceptor splice sites)
GTdonnor
AG
acceptor
intron
![Page 17: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/17.jpg)
Gene 1 prediction with GeneBuilder(organism: no choice….human; option: first and last exon disabled)
Matrix: miscellaneous
One gene found
![Page 18: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/18.jpg)
Gene 1 prediction with GenScan!! No choice except: vertebrate, maize and arabidobsis !
Two genes found
![Page 19: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/19.jpg)
!! No choice except: vertebrate, maize and arabidobsis !
Two genes found
![Page 20: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/20.jpg)
FGENESH
One gene found
![Page 21: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/21.jpg)
Summary (gene prediction)
3 ’5 ’
108310031305
14061452 1661
2000
DO 1084 (1.00)
AC 1304 (0.77)
DO 1407 (0.89)
AC 1451 (0.90)
DO 1662 (1.00)
AC 1913 (1.00)
HMMgene Genebuilder Netgene2 DO:donnor site AC: acceptor site
19141997
and GenScan (organism = human !!)
1557
(organism = human !!)
977
GeneMark: finds a second gene in 3’!!!
163211
FGENESH
+ another potential genefrom positions 2000 to 2900
One gene
![Page 22: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/22.jpg)
ID FGENESH Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;
MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK
AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//
ID GENESCAN1 Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;
MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK
AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//
ID GENESCAN2 Unreviewed; 202 AA.SQ SEQUENCE 202 AA; 23684 MW; 98A69FA21823F2F3 CRC64;
MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDKL
VSDKIKLFRE HKILRIRSVQ HI//
ID GENEMARK1 Unreviewed; 184 AA.SQ SEQUENCE 184 AA; 20255 MW; 85BB0234E6C14EA0 CRC64;
MGRCGSSGKR DGYGAKDSSS EGLSTMKVET CVYSGYKIHP GHGKRLVRTD GKVQIFLSGK ALKGAKLRRN PRDIRWTVLY RIKNKKGTHG QEQVTRKKTK KSVQVVNRAV AGLSLDAILA KRNQTEDFRR QQREQAAKIA KDANKAVRAA KAAANKEKKA SQPKTQQKTA KNVKTAAPRV
GGKR//
ID GENEMARK2 Unreviewed; 183 AA.SQ SEQUENCE 183 AA; 21336 MW; 64F65D472A58046E CRC64;
MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDNV
QHI//
![Page 23: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/23.jpg)
![Page 24: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/24.jpg)
For fun…
Compare the predictions with the same program (GenMark) with different
parameters (HMM trained with eukaroyta or prokaroyta)
![Page 25: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/25.jpg)
![Page 26: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/26.jpg)
![Page 27: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/27.jpg)
Two genes found
![Page 28: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/28.jpg)
Gene 1 prediction with GeneMark (prokaryota specific; E.coli K12)
Protein 1Protein 2
![Page 29: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/29.jpg)
Protein 1
Protein 2
Gene 1 prediction with GeneMark (prokaryota specific)
CDS corresponds ~ to ‘exon’ : there is no intron in prokaryota !
![Page 30: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/30.jpg)
Summary (prokaryota gene prediction)
3 ’5 ’
108310031305
14061452
1661
2000DO
1084 (1.00)
AC 1304 (0.77)
DO 1407 (0.89)
AC 1451 (0.90)
DO 1662 (1.00)
AC 1913 (1.00)
HMMgene Genebuilder Netgene2
DO:donnor site
AC: acceptor site
1914 1997
GenScan
1437 1688
Gene Mark (proka)
1254 1433Protein 1Protein 2
1557
GenMark (euka)
![Page 31: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/31.jpg)
Alignment between the ‘eukaryota and prokaryota’ predicted sequences
![Page 32: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/32.jpg)
Gene prediction: similarity searches with ESTs
ESTs: Expressed sequence tags (cDNAs which are rapidly and badly sequenced)
![Page 33: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/33.jpg)
![Page 34: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/34.jpg)
Blast 2012
Gene A Gene B
Two genes found
![Page 35: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/35.jpg)
Blast 2010
Gene A Gene B
![Page 36: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/36.jpg)
EST1 >gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequenceGGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTTACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGTTTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGTATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATACCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAGGTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGATGGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGACCAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGAAACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA EST2 >gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequenceATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTCTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCTTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCTCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTCTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATCTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTGAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTATAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATACTTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAACTGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCCA
EST3
>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA
Gene A
![Page 37: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/37.jpg)
975-1407 1450-1615 1692-1865
Blast result with EST1
BUT: Blast does not take care of the intron-exon boundaries when aligning DNA with RNA -> we have to use a specific tool : SIM4
The 3rd part of the EST1 is of very bad quality
![Page 38: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/38.jpg)
SIM4 alignment
Example withEST 1 BJ750997
(partial)
The 3rd part of the EST1 is of very bad quality: not align by SIM4 -> EST1 is considered as partial !
![Page 39: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/39.jpg)
EST 3 BJ818152
SIM4 alignment results
EST 1 BJ750997(partial)
EST 2 BJ775052
![Page 40: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/40.jpg)
summary (ESTs)
3 ’5 ’
108310031305
14061452
1661
1914 1997
1615EST1BJ750997.1
EST2 BJ775052.1
EST3 BJ818152.1
Alternative splicing event (intron retention)-> 2 different mRNAs
(EST BJ750997.1 is partial)
…
Gene A
![Page 41: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/41.jpg)
Translation and BLASTpTranslation
(beware the EST sequence orientation !)
![Page 42: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/42.jpg)
>gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequenceGGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTTACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGTTTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGTATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATACCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAGGTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGATGGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGACCAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGAAACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA
EST1
![Page 43: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/43.jpg)
MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ
VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIA
Blastp results
![Page 44: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/44.jpg)
![Page 45: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/45.jpg)
>gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequenceATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTCTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCTTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCTCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTCTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATCTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTGAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTATAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATACTTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAACTGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCCA
EST2
![Page 46: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/46.jpg)
MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPK
TQQKTAKNVKTAAPRVGGKR
Blastp results
![Page 47: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/47.jpg)
![Page 48: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/48.jpg)
>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA EST3
![Page 49: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/49.jpg)
EST1 is partial in C-ter
Gene A
![Page 50: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/50.jpg)
EST1 is partial.EST3 corresponds to the UniProtKB/Swiss-Prot RL24_CAEEL sequence
Gene A
![Page 51: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/51.jpg)
Some prediction programs give the correct protein sequenceNone have predicted the alternative splicing event (EST2; intron 1084-1304 retention)
Gene A
![Page 52: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/52.jpg)
summary (ESTs)
3 ’5 ’
108310031305
14061452
1661
1914 1997
EST BJ775052.1
EST BJ818152
Alternative splicing events (intron retention)-> 2 different mRNAs
MKVET…..1010
MIYLF…..1284
Gene A
![Page 53: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/53.jpg)
![Page 54: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/54.jpg)
Gene 1 is on C.elegans chromosome I
![Page 55: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/55.jpg)
BLAT results
Isoform 2EST2
Gene BGene A
![Page 56: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/56.jpg)
>NP_491399 length=159 MKVETCVYSGYKIHPGHGKRLVRTDGKVQIFLSGKALKGAKLRRNPRDIR WTVLYRIKNKKGTHGQEQVTRKKTKKSVQVVNRAVAGLSLDAILAKRNQT EDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPKTQQKTAKNVKT AAPRVGGKR
RefSeq sequence
![Page 57: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/57.jpg)
InterPro scan results: the protein contains a ribosomal L24e domain
![Page 58: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/58.jpg)
![Page 59: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/59.jpg)
Conclusions (1)
There are 2 different protein sequences due to alternative splicing (intron retention; the shortest isoform is due to a intron retention and is rarely expressed – only 2 ESTs)
Gene A
![Page 60: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/60.jpg)
Conclusions (2)
Gene prediction programs can not predict an alternative splicing event(it can only predict the alternative splice junction)
The protein (Gene A) is a ribosomal protein which belongs to the ribosomal protein L24e family (UniProtKB/Swiss-Prot O01868).
The alternatively spliced sequence is not yet in the protein sequence databases, because it is ‘derived’ from ESTs sequenceswhich are submitted to public DNA/RNA databases without annotated CDS
![Page 61: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/61.jpg)
Non coding region analysis
![Page 62: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/62.jpg)
3’end of chromosome Y EMBL #AJ271736
![Page 63: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/63.jpg)
Example of Alu sequence
![Page 64: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/64.jpg)
Gene 2
![Page 65: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/65.jpg)
Schema recapitulatif
5 ’3 ’
11117891410 1636
1688 1845
AC 1112 (0.56)
DO 1409 (0.92)
DO 1556 (0.96)
AC 1637 (0.61)
HMMgene
Netgene2DO:donneur AC: accepteur
5 ’ 3 ’
1557 Exon 1Exon 2Exon 3
![Page 66: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/66.jpg)
![Page 67: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/67.jpg)
1112 1407 1637 1688
![Page 68: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/68.jpg)
![Page 69: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/69.jpg)
![Page 70: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/70.jpg)
GeneBuilder prediction is not confirmed anywhere else
![Page 71: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/71.jpg)
CDS2 (3 exons)
RefSeq NP_491393 (AF272397)UniProtKB/TrEMBL: G5EC89
237 AA; 3 exonsMMMEYGGYFS SSAVAQQSGD VPTTAPSAVT NSFFYTPQSH NIYHQYATPY LQSGRALTTA HNTSSSSAGN STSSSSSSSN YRNTTHDSLQ AFFNTGLQYQ LYQKSQLIGS DTIQRTSSNV LNGLPRSSLV GALCSTGGAP LNPAERRKQR RIRTTFTSGQ LKELERSFCE THYPDIYTRE EIAMRIDLTE ARVQVWFQNR RAKYRKQEKI RRVKDEEEDP LKKEPGQISL EEIIDQI
A probable nuclear protein with a DNA binding domain (homeobox)
![Page 72: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/72.jpg)
Gene 3
![Page 73: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/73.jpg)
Numérotation « direct strand »
![Page 74: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/74.jpg)
![Page 75: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/75.jpg)
![Page 76: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/76.jpg)
CDS3
>tr|O01864|O01864_CAEEL Hypothetical protein - Caenorhabditis elegans. METEVMKSFNNELSSLFDSKNMSKNKIQDITKAAIKAKSQYKHVVFSVEKLINKCKPDQR LNVLYVIDSIVRASKHQLKEKDTFGPRFMKQFDKFLMPLLKCGQKEKMRTVRTLNLWMSN KVFKESEIQPLREMCKASGLTIDFEEVELAVKGKQADMSIYSGVYKKKPKRSSSSSQPKS RTPTNPHPDDGLLGAGPSSALRSVPDIPNFVLSEDYFLGTISEREMLELVQKFGIDRSGV LSKDKNLLQRALQIFAGSLSQKVEEVLAENNRINGSSIQNVLTKDFEYSDDEEEKEKEPQ PEKQKNLPHAQVLLLAQSLLTQPQILAKLAEVLIPQGNPFGLPFPGEHIVPTSSAALTLG APPPNLMALQQSLPPGFPNQQLGLPNLSGLNQAQLMNVQNAQNMLQLQQRAAQLQALQGN PNAQRNLLMLGNPLLNPFALQHGVNPMLNDLQAAAAAQQQAMLNEAAQSPEKKILELSGG NSGINNSGDVERARLREKEKERESKERRRMGLPPVRIGFTIIASRTLWLKKIPTNIVEND LKQAVESCGEASRVKVIGNRACAYITMENRRSANDVVSKMREVSVAKKMVKVYWARSPGM DSDQFSDLWDSNRGVLEIPYEKLPLDLVALCEGAMLDIESLPIEKKLLYKETGETVISIP PPNIQPPVPHPPPMGFPFQHQLTQLPGQPRPAGLPPGVPPMFNLNAPPPPGIPGYPPAPP PPGVGPPPPQGIPPMGFDPNKPPPPMFQQGFNAGAPPPPFGRGAGPMSSFPPPPRGGMHH MPPPPSFRGGRGGHGGPPPPHFDRRGGGGPPFRPENGRGRLLDQSEMWNREQREMRGGGG AGRDGGREHRDYDRDRSQIDRRRQDDMGARRRSRWGDDDRRDDDRRDDRRDDRRESRRRS PRSPRSPDRRTRRSPSYEREEPPVKKTSVEEETVSSTTLDELKPSVEPTPVPAPIPAPAP
ELKAAEEPVKIVAEHHEDQTDEVPMDLE
![Page 77: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/77.jpg)
Gene 4
![Page 78: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/78.jpg)
Removed from gene 4:1412-1691, 1795-5682, 5842-6048, 6865-6907, 7133-7413,7518-7589, 7754-7999, 7912-7958, 8154-8222, 8414-8496,8660-8709, 9043-9114, 9529-9573, 9706-9769, 9943-9996
EST HMMgene WebGene Netgene2
1346 1411 (AG) (GT)1695 1794 1691 1795
5405 54495679 5841 5668 5859 5683 5841 5682 58426049 6080 6049 6864 6049 6864 6048 68656908 6993 6908 7132 6908 7132 6907 7133
7187 7328 7187 7328 7186 73297411 7520 7414 7517 7414 7517 7413 7518
7564 75897959 8153 7958 8154
7589 7753 7589 77547800 7911 7800 7911 7799 79127954 8113 7959 8135
8223 8413 8223 8413 8222 84148497 8659 8497 8659 8496 86608710 9042 8710 9042 8709 90439115 9528 9115 9528 9114 9529
9631 9705 9574 9705 9574 9705 9573 97069770 9943 9770 9946 9770 9942 99439997 10350 9996
![Page 79: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/79.jpg)
![Page 80: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/80.jpg)
![Page 81: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/81.jpg)
Protein Q3N323
>tr|Q9N323|Q9N323_CAEEL Hypothetical protein - Caenorhabditis elegans. MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEHEFDIIAYKTT FWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVENADLVLVVDNHNRYDICNVY YRNKSGTDHTVVANTDGNLAELDELRWFKYRKLQYTWIDGEWSTPSRAYSHVTPENLASS APTTGLKADDVALRRTYFGPNVMPVKLSPFYELVYKEVLSPFYIFQAISVTVWYIDDYVW YAALIIVMSLYSVIMTLRQTRSQQRRLQSMVVEHDEVQVIRENGRVLTLDSSEIVPGDVL VIPPQGCMMYCDAVLLNGTCIVNESMLTGESIPITKSAISDDGHEKIFSIDKHGKNIIFN GTKVLQTKYYKGQNVKALVIRTAYSTTKGQLIRAIMYPKPADFKFFRELMKFIGVLAIVA FFGFMYTSFILFYRGSSIGKIIIRALDLVTIVVPPALPAVMGIGIFYAQRRLRQKSIYCI SPTTINTCGAIDVVCFDKTGTLTEDGLDFYALRVVNDAKIGDNIVQIAANDSCQNVVRAI ATCHTLSKINNELHGDPLDVIMFEQTGYSLEEDDSESHESIESIQPILIRPPKDSSLPDC QIVKQFTFSSGLQRQSVIVTEEDSMKAYCKGSPEMIMSLCRPETVPENFHDIVEEYSQHG YRLIAVAEKELVVGSEVQKTPRQSIECDLTLIGLVALENRLKPVTTEVIQKLNEANIRSV MVTGDNLLTALSVARECGIIVPNKSAYLIEHENGVVDRRGRTVLTIREKEDHHTERQPKI VDLTKMTNKDCQFAISGSTFSVVTHEYPDLLDQLVLVCNVFARMAPEQKQLLVEHLQDVG QTVAMCGDGANDCAALKAAHAGISLSEAEASIAAPFTSKVADIRCVITLISEGRAALVTS YSAFLCMAGYSLTQFISILLLYWIATSYSQMQFLFIDIAIVTNLAFLSSKTRAHKELAST PPPTSILSTASMVSLFGQLAIGGMAQVAVFCLITMQSWFIPFMPTHHDNDEDRKSLQGTA IFYVSLFHYIVLYFVFAAGPPYRASIASNKAFLISMIGVTVTCIAIVVFYVTPIQYFLGC LQMPQEFRFIILAVATVTAVISIIYDRCVDWISERLREKIRQRRKGA
![Page 82: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/82.jpg)
Prediction of mitochondrial genes (human)
![Page 83: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/83.jpg)
NC_012920.1
Mitochondrial genomeNC_012920.1 annotation
tRNA scan prediction
tRNA scan lists 1- all the tRNAs in the current strand2- all the tRNAs in the complement strandThis tRNA is found at the end of the list
![Page 84: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/84.jpg)
![Page 85: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/85.jpg)
Conclusion
• Good tRNA prediction• If you try: very bad protein-coding gene
prediction….– Mitochondrial genome has not the same sequence
content (codon biais, signals) compare to the nuclear genome.
– You might try with ‘prokaryota’-like gene model, but the results are not perfect… !
![Page 86: Elements of Bioinformatics (14F001) TP2: Gene prediction 22 October 2012 CORRECTIONS](https://reader033.fdocuments.in/reader033/viewer/2022042617/568152bd550346895dc0dea2/html5/thumbnails/86.jpg)