Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella...

12
Assignments 8-12 Note. The presented slides are intended to illustrate the strategies to obtain the solutions for the assignments. It is not required to submit the answers in such a format. Short answers that explain the obtained results are sufficient. Last slide presents a correction to the assignment 3(a) solution.

Transcript of Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella...

Page 1: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

Assignments 8-12

Note. The presented slides are intended to illustrate the strategies to obtain the solutions for the assignments. It is not required to submit the answers in such a format. Short answers that explain the obtained results are sufficient.

Last slide presents a correction to the assignment 3(a) solution.

Page 2: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

8. The complete genome of Zika virus strain, isolated in 2016 from a traveler after returning to The Netherlands from Suriname, was determined at the Leiden University Medical Center and stored to the nucleotide sequence database under accession KY348640. Using Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/ ), construct a multiple alignment of this genome and those from some previous isolates: Uganda, 1947 (accession NC_012532), Philippines, 2012 (KU681082) and French Polynesia, 2013 (KJ776791).Are these sequences very similar ? How many gaps can be found in the alignment ? What are their lengths ? Identify what amino acid residues are encoded by nucleotides inserted within the genome coding region.

Accessions -> Fasta file -> Clustal Omega -> alignment

Gaps can be identified in the alignment e.g. by CmdF option:

NC_012532_Uganda1947 AGTATCAACAGGTTTAA-TTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCCA 119KU681082_PHL/2012 AGTATCAACAGGTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAA 120KY348640_SL1602 AGTATCAACAGGTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAA 120KJ776791_H/PF/2013 AGTATCAACAGGTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAA 120 *************** * **************************************** * ↑ position 78 (KY348640): 1 nucleotide (nt)

NC_012532_Uganda1947 TGGAGTATCGGATAATGCTATCAGTGCATGGCTCCCAGCATAGCGGGATGATTGGATAT- 1438KU681082_PHL/2012 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KY348640_SL1602 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KJ776791_H/PF/2013 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440 ******* *********** ***** ************** ** ******** * ** NC_012532_Uganda1947 -----------GAAACTGACGAAGATAGAGCGAAAGTCGAGGTTACGCCTAATTCACCAA 1487KU681082_PHL/2012 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500KY348640_SL1602 ACACAGGACATGAAACTGATGAGAATAGAGCGAAAGTTGAGATAACGCCCAATTCACCAA 1500KJ776791_H/PF/2013 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500 ******** ** ********** ** *** * ***** ********** |<-------->|positions 1440-1451: 12 nt

NC_012532_Uganda1947 CCCCGGAAAACGCAAAACAGCATATTGACGT-GGGAAAGACCAGAGACTCCATGAGTTTC 10726KU681082_PHL/2012 CCCCGGAAAACGCAAAACAGCATATTGACGCTGGGAAAGACCAGAGACTCCATGAGTTTC 10740KY348640_SL1602 CCCCGGAAAACGCAAAACAGCATATTGACGCTGGGAAAGACCAGAGACTCCATGAGTTTC 10740KJ776791_H/PF/2013 CCCCGGAAAACGCAAAACAGCATATTGACGCTGGGAAAGACCAGAGACTCCATGAGTTTC 10740 ****************************** **************************** ↑ position 10712: 1 nt

NC_012532_Uganda1947 TGGTTTCT 10794KU681082_PHL/2012 TGGGTCT- 10807KY348640_SL1602 TGGGTCT- 10807KJ776791_H/PF/2013 TGGGTCT- 10807 *** * 3’end position 10807: 1 nt

Page 3: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

Coding region (CDS) in KY348640: positions 108-10379

The only gap in the coding region: positions 1440-1451: 12 nt <=> 4 amino acids

A fragment of the alignment (BLAST for 2 sequences)

Query 421 QPENLEYRIMLSVHGSQHSGMI----GYETDEDRAKVEVTPNSPRAEATLGGFGSLGLDC 476 QPENLEYRIMLSVHGSQHSGMI G+ETDE+RAKVE+TPNSPRAEATLGGFGSLGLDCSbjct 421 QPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDC 480

Amino acids in the insertion can be identified by the alignment of encoded proteins APO36913 (Suriname strain) vs. YP_002790881 (Uganda strain):

NC_012532_Uganda1947 TGGAGTATCGGATAATGCTATCAGTGCATGGCTCCCAGCATAGCGGGATGATTGGATAT- 1438KU681082_PHL/2012 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KY348640_SL1602 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KJ776791_H/PF/2013 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440 ******* *********** ***** ************** ** ******** * ** NC_012532_Uganda1947 -----------GAAACTGACGAAGATAGAGCGAAAGTCGAGGTTACGCCTAATTCACCAA 1487KU681082_PHL/2012 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500KY348640_SL1602 ACACAGGACATGAAACTGATGAGAATAGAGCGAAAGTTGAGATAACGCCCAATTCACCAA 1500KJ776791_H/PF/2013 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500 ******** ** ********** ** *** * ***** ********** |<-------->|positions 1440-1451: 12 nt

Insertion: VNDT sequence (amino acid positions 444-447 in the protein of the Suriname strain)

8. The complete genome of Zika virus strain, isolated in 2016 from a traveler after returning to The Netherlands from Suriname, was determined at the Leiden University Medical Center and stored to the nucleotide sequence database under accession KY348640. Using Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/ ), construct a multiple alignment of this genome and those from some previous isolates: Uganda, 1947 (accession NC_012532), Philippines, 2012 (KU681082) and French Polynesia, 2013 (KJ776791).Are these sequences very similar ? How many gaps can be found in the alignment ? What are their lengths ? Identify what amino acid residues are encoded by nucleotides inserted within the genome coding region.

Page 4: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

A fragment of the alignment (BLAST for 2 sequences)

Query 421 QPENLEYRIMLSVHGSQHSGMI----GYETDEDRAKVEVTPNSPRAEATLGGFGSLGLDC 476 QPENLEYRIMLSVHGSQHSGMI G+ETDE+RAKVE+TPNSPRAEATLGGFGSLGLDCSbjct 421 QPENLEYRIMLSVHGSQHSGMIVNDTGHETDENRAKVEITPNSPRAEATLGGFGSLGLDC 480

Amino acids in the insertion can be identified by the alignment of encoded proteins APO36913 (Suriname strain) vs. YP_002790881 (Uganda strain):

NC_012532_Uganda1947 TGGAGTATCGGATAATGCTATCAGTGCATGGCTCCCAGCATAGCGGGATGATTGGATAT- 1438KU681082_PHL/2012 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KY348640_SL1602 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KJ776791_H/PF/2013 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440 ******* *********** ***** ************** ** ******** * ** NC_012532_Uganda1947 -----------GAAACTGACGAAGATAGAGCGAAAGTCGAGGTTACGCCTAATTCACCAA 1487KU681082_PHL/2012 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500KY348640_SL1602 ACACAGGACATGAAACTGATGAGAATAGAGCGAAAGTTGAGATAACGCCCAATTCACCAA 1500KJ776791_H/PF/2013 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500 ******** ** ********** ** *** * ***** **********

NC_012532_Uganda1947 TGGAGTATCGGATAATGCTATCAGTGCATGGCTCCCAGCATAGCGGGATGATT------- 1432KU681082_PHL/2012 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KY348640_SL1602 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440KJ776791_H/PF/2013 TGGAGTACCGGATAATGCTGTCAGTTCATGGCTCCCAGCACAGTGGGATGATCGTTAATG 1440 ******* *********** ***** ************** ** ******** NC_012532_Uganda1947 -----GGATATGAAACTGACGAAGATAGAGCGAAAGTCGAGGTTACGCCTAATTCACCAA 1487KU681082_PHL/2012 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500KY348640_SL1602 ACACAGGACATGAAACTGATGAGAATAGAGCGAAAGTTGAGATAACGCCCAATTCACCAA 1500KJ776791_H/PF/2013 ACACAGGACATGAAACTGATGAGAATAGAGCGAAGGTTGAGATAACGCCCAATTCACCAA 1500 *** ********** ** ********** ** *** * ***** **********

Insertion: VNDT sequence (amino acid positions 444-447 in the protein of the Suriname strain)

Alignment yielded by Clustal Omega: inserted sequence is

GAC ACA GGA CAT

Alignment based on amino acid sequences: inserted sequence is

GTT AAT GAC ACA

8. The complete genome of Zika virus strain, isolated in 2016 from a traveler after returning to The Netherlands from Suriname, was determined at the Leiden University Medical Center and stored to the nucleotide sequence database under accession KY348640. Using Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/ ), construct a multiple alignment of this genome and those from some previous isolates: Uganda, 1947 (accession NC_012532), Philippines, 2012 (KU681082) and French Polynesia, 2013 (KJ776791).Are these sequences very similar ? How many gaps can be found in the alignment ? What are their lengths ? Identify what amino acid residues are encoded by nucleotides inserted within the genome coding region.

Page 5: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

9. The CRISPR-Cas9 systems of RNA-guided DNA editing exploit diverse Cas9 proteins. Using Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/ ), construct a multiple alignment of representative amino acid sequences of proteins from different organisms, used in type II CRISPR systems of different subtypes: Cas9 of Streptococcus pyogenes, accession WP_038434062; Listeria monocytogenes, WP_061665472, (both subtype II-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii, J3F2B0 (both subtype II-C).Are these sequences very similar ? What is the length of the longest stretch of amino acid residues completely conserved in all 5 sequences according to the Clustal Omega alignment? Give this motif in single letter code. Explore the phylogenetic tree yielded by the Clustal Omega: is its clustering consistent with the subtype classification ?

Accessions -> Fasta file -> Clustal Omega -> alignment

The sequences do not seem to be very similar: many gaps, the longest stretches of 100% conserved residues are just 3 amino acids: ARR and DHI.

A fragment of the alignment:

J3F2B0_AnaCas9 GKEGKKDHDTRKKLSGIARRARRLLHHRRTQLQQLDEVLR-------------------- 91A0Q5Y3_FnCas9 SKD----SYT---LLMNNRTARRHQRRGIDRKQLVKRLFKLIWTEQLNLEWDKDTQQAIS 96J7RUA5_SaCas9 FKEANVENNE---GRRSKRGARRLKRRRRHRIQRVKKLLF---DYN-------------- 75WP_061665472_LmCas9 FDDGQ--TAV---DRRMNRTARRRIERRRNRISYLQEIFA---VEMANID---------A 95WP_038434062_SpyCas9 FDSGE--TAE---ATRLKRTARRRYTRRKNRICYLQEIFS---NEMAKVD---------D 95 .. * *** : : :..::

A fragment of the alignment:

J3F2B0_AnaCas9 YISRGDIVRLDALELQGCACLYCGTTIG-------YHTCQLDHIVPQAGPG--SNNRRGN 597A0Q5Y3_FnCas9 F--KDKNN--RIKEFAKGISAYSGANLTDGDF--DGAKEELDHIIPRSHKKYGTLNDEAN 986J7RUA5_SaCas9 A--KYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFN---N 571WP_061665472_LmCas9 Q--ELKNNRLYLYYLQNGKDMYTGQELD----IHNLSNYDIDHIVPQSFITDNSID---N 857WP_038434062_SpyCas9 T--QLQNEKLYLYYLQNGRDMYVDQELD----INRLSDYDVDHIVPQSFLKDDSID---N 854 . : * : ::***:*:: : : *

Page 6: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

The tree yielded by the Clustal Omega:

Its clustering is only partially consistent with the subtype classification. Two subtype II-A sequences (SpyCas9 and LmCas9) are clustered together indeed. But the subtype II-C proteins (SaCas9 and AnaCas9) are not clustered.

9. The CRISPR-Cas9 systems of RNA-guided DNA editing exploit diverse Cas9 proteins. Using Clustal Omega program (http://www.ebi.ac.uk/Tools/msa/clustalo/ ), construct a multiple alignment of representative amino acid sequences of proteins from different organisms, used in type II CRISPR systems of different subtypes: Cas9 of Streptococcus pyogenes, accession WP_038434062; Listeria monocytogenes, WP_061665472, (both subtype II-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii, J3F2B0 (both subtype II-C).Are these sequences very similar ? What is the length of the longest stretch of amino acid residues completely conserved in all 5 sequences according to the Clustal Omega alignment? Give this motif in single letter code. Explore the phylogenetic tree yielded by the Clustal Omega: is its clustering consistent with the subtype classification ?

Page 7: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

10. Using the database of protein profiles InterPro (http://www.ebi.ac.uk/interpro/), determine the positions of structural domains in one of the proteins from the 1918 "Spanish flu" influenza virus, accession AAK14368. What kind of proteins share the same structural fold in their RNA-binding domains as this viral protein ?

--> RNA-binding domain at the positions 1-70.

--> This structural fold is also shared by ribosomal proteins S15.

--> Other signatures detected: positions 79-205 : NS1 effector domain; 205-230 : “mobidb-lite (disorder_prediction)”.

Page 8: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

11. A viral protein (RdRp from Beihai hepe-like virus 2, Accession APG77551), identified in a metagenomic study of viromes in various species, was suggested to be similar to some eukaryotic proteins. Using BLAST program, identify human protein that is most similar to this viral RdRp. What are the name and accession number of the human homolog ? Does the region of relatively high sequence similarity in the BLAST alignment of two proteins correspond to some known conserved domain motif ? Scan both protein sequences in the databases of protein profiles PROSITE (http://prosite.expasy.org/) and InterPro (http://www.ebi.ac.uk/interpro/): what are the profiles found in the similarity regions yielded by BLAST alignment ?

12/09/2017, 10)17NCBI Blast:APG77551)RdRp [Beihai hepe-like virus 2]

Page 2 of 4https://blast.ncbi.nlm.nih.gov/Blast.cgi

Sequences producing significant alignments:

PREDICTED: ERI1 exoribonuclease 3 isoform X1 [Homo sapiens]Sequence ID: XP_016857790.1 Length: 335 Number of Matches: 1Range 1: 144 to 321

Score Expect Method Identities Positives Gaps Frame

117 bits(292) 6e-27() Compositional matrix adjust. 70/178(39%) 92/178(51%) 8/178(4%)

Features:

Query 1123 YLYLDFEATCDDKFI-VQEIIEFPVIGYQDQKEVFR--FHAYVKPK-RSRVTPYCTNLTG 1178 +L LDFEATCD I QEIIEFP++ + FH YV+P ++TP+CT LTGSbjct 144 FLVLDFEATCDKPQIHPQEIIEFPILKLNGRTMEIESTFHMYVQPVVHPQLTPFCTELTG 203

Query 1179 ITQQKVDQCEEFIVVYDAFLEWFKQHVKGD----FLFITCGDWDLNKMLPSQLIYYKRSI 1234 I Q VD V + EW + D +F+TCGDWDL MLP Q Y +Sbjct 204 IIQAMVDGQPSLQQVLERVDEWMAKEGLLDPNVKSIFVTCGDWDLKVMLPGQCQYLGLPV 263

Query 1235 DPIFRKYKNLKHIFRDQFKFKKTVDMMQMLQYLNIAHYGVHHSGIDDCVNIAAIHNKL 1292 F+++ NLK + ++ M + L++ H G HSGIDDC NIA I LSbjct 264 ADYFKQWINLKKAYSFAMGCWPKNGLLDMNKGLSLQHIGRPHSGIDDCKNIANIMKTL 321

ERI1 exoribonuclease 3 isoform 1 [Homo sapiens]Sequence ID: NP_076971.1 Length: 337 Number of Matches: 1

Range 1: 146 to 323

Score Expect Method Identities Positives Gaps Frame

117 bits(292) 7e-27() Compositional matrix adjust. 70/178(39%) 92/178(51%) 8/178(4%)

Features:

Query 1123 YLYLDFEATCDDKFI-VQEIIEFPVIGYQDQKEVFR--FHAYVKPK-RSRVTPYCTNLTG 1178 +L LDFEATCD I QEIIEFP++ + FH YV+P ++TP+CT LTGSbjct 146 FLVLDFEATCDKPQIHPQEIIEFPILKLNGRTMEIESTFHMYVQPVVHPQLTPFCTELTG 205

Query 1179 ITQQKVDQCEEFIVVYDAFLEWFKQHVKGD----FLFITCGDWDLNKMLPSQLIYYKRSI 1234 I Q VD V + EW + D +F+TCGDWDL MLP Q Y +Sbjct 206 IIQAMVDGQPSLQQVLERVDEWMAKEGLLDPNVKSIFVTCGDWDLKVMLPGQCQYLGLPV 265

Query 1235 DPIFRKYKNLKHIFRDQFKFKKTVDMMQMLQYLNIAHYGVHHSGIDDCVNIAAIHNKL 1292

Descriptions

Description Maxscore

Totalscore

Querycover

Evalue

Ident Accession

PREDICTED: ERI1 exoribonuclease 3 isoform X1 [Homosapiens] 117 117 7% 6e-27 39% XP_016857790.1

ERI1 exoribonuclease 3 isoform 1 [Homo sapiens] 117 117 7% 7e-27 39% NP_076971.1

prion protein interacting protein, isoform CRA_b [Homosapiens] 112 112 7% 9e-27 39% EAX07043.1

ERI1 exoribonuclease 3 isoform 2 [Homo sapiens] 114 114 7% 1e-26 39% NP_001288627.1

Chain A, Crystal Structure Of Human Eri1 Exoribonuclease3 113 113 7% 1e-26 39% 2XRI_A

unknown [Homo sapiens] 115 115 7% 6e-26 39% AAC19158.1

PREDICTED: ERI1 exoribonuclease 3 isoform X5 [Homosapiens] 99.8 99.8 6% 4e-22 37% XP_016857793.1

ERI1 exoribonuclease 3 isoform 4 [Homo sapiens] 91.3 91.3 5% 8e-20 38% NP_001288630.1

prion protein interacting protein, isoform CRA_a [Homosapiens] 92.0 92.0 5% 2e-19 38% EAX07042.1

ERI1 exoribonuclease 3 isoform 3 [Homo sapiens] 90.9 90.9 5% 3e-19 41% NP_001288629.1

PREDICTED: ERI1 exoribonuclease 3 isoform X2 [Homosapiens] 92.4 92.4 5% 7e-19 41% XP_005271243.1

ERI3 protein [Homo sapiens] 66.2 66.2 3% 2e-11 40% AAH01072.1

similar to C. elegans hypothetical protein; similar toAF038615 (PID:g2736329) [Homo sapiens] 58.9 58.9 3% 2e-09 39% AAC04618.1

hCG1774090 [Homo sapiens] 34.7 34.7 3% 8.4 27% EAW86370.1

Alignments

See 2 more title(s)

BLASTing with APG77551 in human proteins yields alignment to human ERI1 exonuclease 3

Similar alignments to putative protein XP_016857790 and reference sequence NP_076971

12/09/2017, 10)17NCBI Blast:APG77551)RdRp [Beihai hepe-like virus 2]

Page 1 of 4https://blast.ncbi.nlm.nih.gov/Blast.cgi

RID

Database NameDescription

Program

Query IDDescription

Molecule typeQuery Length

APG77551.1RdRp [Beihai hepe-like virus 2]amino acid2313

BLAST ® » blastp suite » RID-VF5VT8JZ014

BLAST Results

Job title: APG77551:RdRp [Beihai hepe-like virus 2]

VF5VT8JZ014 (Expires on 09-13 16:14 pm)

nrAll non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excludingenvironmental samples from WGS projectsBLASTP 2.7.0+

New Analyze your query with SmartBLAST

Putative conserved domains have been detected, click on the image below for detailed results.

Distribution of the top 14 Blast Hits on 14 subject sequences

<40 40-50 50-80 80-200 >=200

1 450 900 1350 1800 2250

Color key for alignment scores

Query

Graphic Summary

The aligned region 1123-1292 correspond to a conserved domain detected by BLAST

Page 9: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

Scan in PROSITE: viral RdRp -> domains 53-232 and 2083-2194, outside the similarity region 1123-1292; human ERI1 exonuclease 3 -> no hits.

Scan in InterPro:

Ribonuclease H-like domain is found in both proteins at the positions approximately corresponding to the similarity region yielded by BLAST (1123-1292 vs. 144-321). InterPro also identifies the signature “Exonuclease, RNase T/DNA polymerase III” in the same domain.

11. A viral protein (RdRp from Beihai hepe-like virus 2, Accession APG77551), identified in a metagenomic study of viromes in various species, was suggested to be similar to some eukaryotic proteins. Using BLAST program, identify human protein that is most similar to this viral RdRp. What are the name and accession number of the human homolog ? Does the region of relatively high sequence similarity in the BLAST alignment of two proteins correspond to some known conserved domain motif ? Scan both protein sequences in the databases of protein profiles PROSITE (http://prosite.expasy.org/) and InterPro (http://www.ebi.ac.uk/interpro/): what are the profiles found in the similarity regions yielded by BLAST alignment ?

Page 10: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

12. A substitution Q157R (Gln157->Arg) in the splicing factor U2AF1 (accession NP_001020374) is associated with development of several types of cancer. The substitution is determined by the change of CAG codon to CGG.(a) Determine nucleotide position of this mutation in the RNA transcript encoding U2AF1 isoform with the

accession NP_001020374. In which of the exons of this transcript is it located ?(b) Determine the protein domains in U2AF1 using the InterPro database (http://www.ebi.ac.uk/interpro/). In

which of the domains does the substitution occur ?

NP_001020374.1 splicing factor U2AF 35 kDa subunit isoform b [Homo sapiens] 1 maeylasifg tekdkvncsf yfkigacrhg drcsrlhnkp tfsqtiliqn iyrnpqnsaq 61 tadgshcavs dvemqehyde ffeevfteme ekygeveemn vcdnlgdhlv gnvyvkfrre

121 edaekavidl nnrwfngqpi haelspvtdf reaccrqyem gectrggfcn fmhlkpisre181 lrrelygrrr kkhrsrsrsr errsrsrdrg rggggggggg gggrerdrrr srdrersgrf

Q157

--> This protein is coded by the mRNA transcript NM_001025203 (Entrez hard link)

--> Annotation of NM_001025203: CDS 85..807 exons 1..128 129..216 217..283 284..333 334..432 433..566 567..659 660..953--> Calculation of nucleotide positions that encode Q157: 84 + (157 × 3) = 555 => codon 553-555 codes for amino acid Q157.--> A fragment of transcript NM_001025203:

541 gcctgctgcc gtcagtatga gatgggagaa tgcacacgag gcggcttctg caacttcatg

--> Substitution A554->G, located in exon 6 (433..566) => CAG->CGG

Page 11: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

InterPro scan: the substitution Q157R occurs in the Zinc finger motif

12. A substitution Q157R (Gln157->Arg) in the splicing factor U2AF1 (accession NP_001020374) is associated with development of several types of cancer. The substitution is determined by the change of CAG codon to CGG.(a) Determine nucleotide position of this mutation in the RNA transcript encoding U2AF1 isoform with the

accession NP_001020374. In which of the exons of this transcript is it located ?(b) Determine the protein domains in U2AF1 using the InterPro database (http://www.ebi.ac.uk/interpro/). In

which of the domains does the substitution occur ?

Page 12: Assignments 8-12liacs.leidenuniv.nl/~bakkerem2/cmb2018/CMB2018_Lecture09.pdfII-A); Francisella novicida, A0Q5Y3 (subtype II-B); Staphylococcus aureus, J7RUA5, and Actinomyces naeslundii,

3. Human gene for RNA binding protein Hermes (Gene ID: 11030) contains several exons, and several protein isoforms are produced by alternative splicing that may lead to a shift in the translation frame. In particular, the NCBI Gene database describes that transcript variants (2), accession NM_001008711.2, and (3), accession NM_001008712.2, lack some of the exons and contain alternative ones, compared to trancript (1), accession NM_001008710.2.(a) Determine which of the exons of the transcript (1) are missing in the transcripts (2) and (3).(b) The three transcripts encode protein isoforms which have different C-termini. Determine these C-terminal amino

acid sequences.

The annotation of exons in the “FEATURES” datafields: (1) (2) (3) NM_001008710.2 NM_001008711.2 NM_001008712.2 1..658 1..658 1..658 659..736 659..736 659..736 737..775 737..775 737..775 776..838 776..838 776..838 839..989 839..989 839..989 990..1120 990..1120 990..11201121..1190 1121..1206 1121..13531191..1294 1207..12761295..2919 1277..1380 1381..3005

The last three exons of the transcript (1) are absent in the transcripts (2) and (3). Alternative splicing produces different mRNA sequences.

Exon comparisons can be done by BLAST for 2 sequences, e.g. transcript (1) vs. transcript (2). The last three exons of the transcripts (1) and (2) are absent in the transcript (3). Transcripts (2) and (3) have unique alternative exons 1121-1206 and 1121-1353, respectively. Alternative splicing produces different mRNA sequences.

Incorrect!

Correction: