Swetlana Nikolajewa and Thomas Wilhelm

1
Swetlana Nikolajewa and Thomas Wilhelm Institute of Molecular Biotechnology Jena, Germany, http://www.imb-jena.de/tsb Jena Institute of Molecular Biotechnology We present a new classification scheme of the genetic code, based on a binary representation of purines (A, G = 1) and pyrimidines (U, C = 0). On the basis of this logical organization we detect codon regularities (strong, mixed and weak codons) and patterns of amino acid properties and show symmetry characteristics of the genetic code. We present the “codon - reverse codon” pattern and use it to remove the only ambiguity which had remained in our recently proposed new classification scheme of the genetic code. The purine-pyrimidine scheme has now found its final form. We show that the “codon - reverse codon” pattern is correlated to aspects of the tRNA anticodon distribution in all organisms. It is well known that there is no tRNA with an anticodon for any of the STOP codons; we found that there is also no tRNA containing reverse STOP anticodons. Code Code Strong codons Strong codons 6 hydrogen bonds Mixed codons Mixed codons 5 hydrogen bonds Mixed codons Mixed codons 5 hydrogen bonds Weak codons Weak codons 4 hydrogen bonds 000 Pro CC (C/U) Ser UC (C/U) Leu CU (C/U) Phe UU (C/U) 001 Pro CC (A/G) Ser UC (A/G) Leu CU (A/G) Leu UU (A/G) 100 Ala GC (C/U) Thr AC (C/U) Val GU (C/U) Ile AU (C/U) 101 Ala GC (A/G) Thr AC (A/G) Val GU (A/G) Ile / Met AU (A/G) 010 Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U) 011 Arg CG (A/G) Stop / Trp UG (A/G) Gln CA (A/G) Stop UA (A/G) 110 Gly GG (C/U) Ser AG (C/U) Asp GA (C/U) Asn AA (C/U) 111 Gly GG (A/G) Arg AG (A/G) Glu GA (A/G) Lys AA (A/G) The classification scheme of the genetic code is based on a binary representation of purines (G,A) - 1, pyrimidines (C,U) - 0. The eight rows (2 3 = 8) and four columns are sufficient to place the 20 amino acids, as well as the termination codons. The four combinations of the first column (CC*, GC*, CG*, and GG*) always imply 6 hydrogen bonds in complementary base-paring with the corresponding anticodon of the tRNA. Accordingly, they are called strong codons. Each row contains exactly 4 different amino acids (including the stop codon). Exceptions in the standard code are the second row with two leucines and the fourth row with the AU* start codon. The first two bases guarantee 5 H-bonds: mixed codons. T T he codon-anticodon symmetry he codon-anticodon symmetry: the red horizontal line marks the symmetry axis. For instance, codon CCC (Pro, first column, first row) has the anticodon GGG (Gly, first column, last row). In 8 different mitochondrial codes UG(A/G) encodes Trp. Weak codons have just 4 H- bonds . The common table of the standard genetic code “Codon - Reverse Codon” Pattern 1.The table can be divided into four blocks (codon - reverse codon groups) of the same size, for instance the upper left block with Pro (P), Ser (S), Ala (A) and Thr (T). Each block shows the same arrow pattern. All strongly evolutionary conserved groups of amino acids (Thompson et al., 1994) are subsets of exactly one codon - reverse codon group, e.g. the MILV amino acids belong to the upper right block in the table. The other conserved strong groups belonging to one block are STA, NEQK, NHQK, NDEQ, QHRK, MILF, HY. The only exception is FYW. 2. We studied all known tRNA genes of 104 different organisms (Sprinzl et al., 1999). It is known that the STOP codons do not have any tRNA. We found that there are also no tRNA genes containing anticodons reverse to the STOP anticodons (ACT, ATC and ATT in Table 2). The only exception is H. sapiens, possessing one tRNA Asn gene with the anticodon ATT, but humans also have three different possible suppressor tRNA genes (Lowe and Eddy, 1997). Codons Reverse Codons AA Anticodons # tRNAs AA Anticodons # tRNAs Stop U UGA TCA 0 Ser AG U U ACT 0 Stop U UAG CTA 0 Asp GA U U ATC 0 Stop U UAA TTA 0 Asn AA U U ATT 1 Tyr U UAC GTA 189 His CA U U ATG 0 Trp U UGG CCA 144 Gly GG U U ACC 0 Leu U UUA TAA 127 Ile AU U U AAT 1 Leu U UUG CAA 134 Val GU U U AAC 18 Phe U UUC GAA 154 Leu CU U U AAG 28 Ser U UCA TGA 151 Thr AC U U AGT 19 Ser U UCC GGA 116 Pro CC U U AGG 16 CUA GAU Asp UAG STOP Asp Stop AUC Firs t Lett er Second Letter Thir d Lett er U C A G U UUU Ph e UCU Se r UAU Tyr UGU Cys U UUC UCC UAC UGC C UUA Le u UCA Se r UAA Sto p UGA Stop A UUG UCG UAG UGG Trp G C CUU Le u CCU Pr o CAU His CGU Arg U CUC CCC CAC CGC C CUA Le u CCA Pr o CAA Gln CGA Arg A CUG CCG CAG CGG G A AUU Il e ACU Th r AAU Asn AGU Ser U AUC ACC AAC AGC C AUA Il e ACA Th r AAA Lys AGA Arg A AUG Me t ACG AAG AGG G G GUU Va l GCU Al a GAU Asp GGU Gly U GUC GCC GAC GGC C GUA Va l GCA Al a GAA Glu GGA Gly A GUG GCG GAG GGG G The arrows in the above table indicate “codon - reverse codons” pairs: the reverse codon of a codon XYZ is defined as ZYX, where X, Y, Z can be any base. For instance, the reverse codon of GGA (Gly) is AGG (Arg). There exist 15 different amino acids in the rows 000, 101, 010 and 111 where the codon is reverse to itself, e.g. Lys (AAA), Tyr (UAU). REFERENCES Crick, F. H. C. (1968) The origin of the genetic code. J. Mol. Biol., 38, 367- 379. Halitsky, D. (2003) Extending the (hexa-)rhombic dodecahedral model of the genetic code: the code’s 6-fold degeneracies and the orthogonal projections of the 5-cube as 3- cube. Contributed paper (983-92-151), American Mathematical Society; and personal communication. Jungck, J.R. (1978) The genetic code as a periodic table. J. Mol. Evol., 11, 211-224. Lagerkvist, U. (1978) “Two out of three”: An alternative method for codon reading. Proc. Natl Acad. Sci. USA, 75, 1759-1762. Lagerkvist, U. (1981) Unorthodox codon reading and the evolution of the genetic code. Cell, 23, 305-306. Lowe, T. M., Eddy, S. R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25, 955-964. http:// lowelab.ucsc.edu/GtRNAdb / Nikolajewa, S., Beyer, A., Friedel, M., Hollunder, J., and Wilhelm, T. (2005) Common patterns in type II restriction enzyme binding sites. Nucleic Acids Res., 33, 2726-2733. Sprinzl, M., Vassilenko, K. S., Emmerich, J., Bauer, F., (1999) Compilation of tRNA sequences and sequences of tRNA genes. Last update 2003. www.uni-bayreuth.de/departments/biochemie/trna / . Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2005) REBASE—restriction enzymes and DNA methyltransferases Nucleic Acids Res., 33, D230–D232 Thompson, J. D., Higgins, D. G., Gibson T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position- specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. Wilhelm, T., Nikolajewa, S. L. (2004) A new classification scheme of the genetic code. J. Mol. Evol., 59, 598 -605 . Woese, C. R. (1967) The genetic code: The molecular basis for Genetic Expression. Harper & Row, New York. The point symmetry corresponds to Halitsky’s family – nonfamily symmetry operation (“E-M bifurcation”, Halitsky 2003). For instance, the family codon GUA (Val) is mapped into the nonfamily codon UGC (Cys). Thus, this point symmetry is behind the family – nonfamily symmetry in our scheme (shaded vs. unshaded regions). Interestingly, in 5 different mitochondrial codes (including yeast) AU(A/G) always encodes for Met and UG(A/G) encodes Trp. Thus, 32 positions code for 32 entries and each row contains exactly four different amino acids. In this spirit the mitochondrial code is the most regular one. 3. There are some anticodons, where the reverse anticodon has no tRNA, e.g. all 104 studied organisms have at least one tRNA for Tyr with anticodon GTA (some organisms, for instance H.sapiens, have different tRNAs with the same anticodon, altogether there are 189 different tRNAs with the anticodon GTA in the 104 species), but no organism has a tRNA for His with anticodon ATG. Table 2 lists different codon - reverse codon pairs. In the whole superkingdom bacteria the number of tRNA genes for the reverse part of table 2 is always zero, the few exceptions are only in eukaryotes and archaea. Another unique property of all bacteria is that there is no tRNA with anticodons for the self reverse codons UUU (Phe), UCU (Ser), UGU (Cys) and UAU (Tyr). Generally, table 2 shows that anticodons with an A at the third position are strongly preferred, whereas A** anticodons are significantly suppressed. The arrows represent “codon - reverse codons” pairs. Table 2. The amino acid codon- reverse codon pairs and the number of tRNA genes for the corresponding anticodons.

description

Asp. Stop. CUA. Institute of Molecular Biotechnology. Jena. AUC. GAU. UAG. Asp. STOP. THE PURINE-PYRIMIDINE CLASSIFICATION SCHEME REVEALS NEW PATTERNS IN THE GENETIC CODE. Swetlana Nikolajewa and Thomas Wilhelm - PowerPoint PPT Presentation

Transcript of Swetlana Nikolajewa and Thomas Wilhelm

Page 1: Swetlana Nikolajewa and Thomas Wilhelm

Swetlana Nikolajewa and Thomas Wilhelm Institute of Molecular Biotechnology Jena, Germany,

http://www.imb-jena.de/tsb

Jena

InstituteofMolecular Biotechnology

We present a new classification scheme of the genetic code, based on a binary representation of purines (A, G = 1) and pyrimidines (U, C = 0). On the basis of this logical organization we detect codon regularities (strong, mixed and weak codons) and patterns of amino acid properties and show symmetry characteristics of the genetic code. We present the “codon - reverse codon” pattern and use it to remove the only ambiguity which had remained in our recently proposed new classification scheme of the genetic code. The purine-pyrimidine scheme has now found its final form. We show that the “codon - reverse codon” pattern is correlated to aspects of the tRNA anticodon distribution in all organisms. It is well known that there is no tRNA with an anticodon for any of the STOP codons; we found that there is also no tRNA containing reverse STOP anticodons.

CodeCode Strong codonsStrong codons6 hydrogen bonds

Mixed codonsMixed codons5 hydrogen bonds

Mixed codonsMixed codons5 hydrogen bonds

Weak codonsWeak codons4 hydrogen bonds

000 Pro CC (C/U) Ser UC (C/U) Leu CU (C/U) Phe UU (C/U)

001 Pro CC (A/G) Ser UC (A/G) Leu CU (A/G) Leu UU (A/G)

100 Ala GC (C/U) Thr AC (C/U) Val GU (C/U) Ile AU (C/U)

101 Ala GC (A/G) Thr AC (A/G) Val GU (A/G) Ile / Met AU (A/G)

010 Arg CG (C/U) Cys UG (C/U) His CA (C/U) Tyr UA (C/U)

011 Arg CG (A/G) Stop / Trp UG (A/G) Gln CA (A/G) Stop UA (A/G)

110 Gly GG (C/U) Ser AG (C/U) Asp GA (C/U) Asn AA (C/U)

111 Gly GG (A/G) Arg AG (A/G) Glu GA (A/G) Lys AA (A/G)

The classification scheme of the genetic code is based on a binary representation of

purines  (G,A)        -   1, pyrimidines (C,U)  -   0.

The eight rows (23 = 8) and four columns are sufficient to place the 20 amino acids, as well as the termination codons.

The four combinations of the first column (CC*, GC*, CG*, and GG*) always imply 6 hydrogen bonds in complementary base-paring with the corresponding anticodon of the tRNA. Accordingly, they are called strong codons.

Each row contains exactly 4 different amino acids (including the stop codon).Exceptions in the standard code are the second row with two leucines and the fourth row with the AU* start codon.

The first two bases guarantee 5 H-bonds: mixed codons.

TThe codon-anticodon symmetryhe codon-anticodon symmetry: the red horizontal line marks the symmetry axis. For instance, codon CCC (Pro, first column, first row) has the anticodon GGG (Gly, first column, last row).

In 8 different mitochondrial codes UG(A/G) encodes Trp.

Weak codons have just 4 H-bonds .

The common table of the standard genetic code

“Codon - Reverse Codon” Pattern

1. The table can be divided into four blocks (codon - reverse codon groups) of the same size, for instance the upper left block with Pro (P), Ser (S), Ala (A) and Thr (T). Each block shows the same arrow pattern. All strongly evolutionary conserved groups of amino acids (Thompson et al., 1994) are subsets of exactly one codon - reverse codon group, e.g. the MILV amino acids belong to the upper right block in the table. The other conserved strong groups belonging to one block are STA, NEQK, NHQK, NDEQ, QHRK, MILF, HY. The only exception is FYW.

2. We studied all known tRNA genes of 104 different organisms (Sprinzl et al., 1999). It is known that the STOP codons do not have any tRNA. We found that there are also no tRNA genes containing anticodons reverse to the STOP anticodons (ACT, ATC and ATT in Table 2). The only exception is H. sapiens, possessing one tRNAAsn gene with the anticodon ATT, but humans also have three different possible suppressor tRNA genes (Lowe and Eddy, 1997).

Codons Reverse Codons

AA Anticodons # tRNAs AA Anticodons # tRNAs

Stop UUGA TCA 0 Ser AGUU ACT 0

Stop UUAG CTA 0 Asp GAUU ATC 0

Stop UUAA TTA 0 Asn AAUU ATT 1

Tyr UUAC GTA 189 His CAUU ATG 0

Trp UUGG CCA 144 Gly GGUU ACC 0

Leu UUUA TAA 127 Ile AUUU AAT 1

Leu UUUG CAA 134 Val GUUU AAC 18

Phe UUUC GAA 154 Leu CUUU AAG 28

Ser UUCA TGA 151 Thr ACUU AGT 19

Ser UUCC GGA 116 Pro CCUU AGG 16

CUA

GAUAsp

UAGSTOP

AspStop

AUC

FirstLetter

Second LetterThirdLetterU C A G

U

UUUPhe

UCUSer

UAUTyr

UGUCys

U

UUC UCC UAC UGC C

UUALeu

UCASer

UAAStop

UGA Stop A

UUG UCG UAG UGG Trp G

C

CUULeu

CCUPro

CAUHis

CGUArg

U

CUC CCC CAC CGC C

CUALeu

CCAPro

CAAGln

CGAArg

A

CUG CCG CAG CGG G

A

AUUIle

ACUThr

AAUAsn

AGUSer

U

AUC ACC AAC AGC C

AUA Ile ACAThr

AAALys

AGAArg

A

AUG Met ACG AAG AGG G

G

GUUVal

GCUAla

GAUAsp

GGUGly

U

GUC GCC GAC GGC C

GUAVal

GCAAla

GAAGlu

GGAGly

A

GUG GCG GAG GGG G

The arrows in the above table indicate “codon - reverse codons” pairs: the reverse codon of a codon XYZ is defined as ZYX, where X, Y, Z can be any base. For instance, the reverse codon of GGA (Gly) is AGG (Arg). There exist 15 different amino acids in the rows 000, 101, 010 and 111 where the codon is reverse to itself, e.g. Lys (AAA), Tyr (UAU).

REFERENCES

Crick, F. H. C. (1968) The origin of the genetic code. J. Mol. Biol., 38, 367- 379.Halitsky, D. (2003) Extending the (hexa-)rhombic dodecahedral model of the genetic code: the code’s 6-fold degeneracies and the orthogonal projections of the 5-cube as 3-cube. Contributed paper (983-92-151), American Mathematical Society; and personal communication.Jungck, J.R. (1978) The genetic code as a periodic table. J. Mol. Evol., 11, 211-224. Lagerkvist, U. (1978) “Two out of three”: An alternative method for codon reading. Proc. Natl Acad. Sci. USA, 75, 1759-1762. Lagerkvist, U. (1981) Unorthodox codon reading and the evolution of the genetic code. Cell, 23, 305-306.Lowe, T. M., Eddy, S. R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25, 955-964. http://lowelab.ucsc.edu/GtRNAdb/ Nikolajewa, S., Beyer, A., Friedel, M., Hollunder, J., and Wilhelm, T. (2005) Common patterns in type II restriction enzyme binding sites. Nucleic Acids Res., 33, 2726-2733.Sprinzl, M., Vassilenko, K. S., Emmerich, J., Bauer, F., (1999) Compilation of tRNA sequences and sequences of tRNA genes. Last update 2003. www.uni-bayreuth.de/departments/biochemie/trna/ . Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2005) REBASE—restriction enzymes and DNA methyltransferases Nucleic Acids Res., 33, D230–D232Thompson, J. D., Higgins, D. G., Gibson T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680.Wilhelm, T., Nikolajewa, S. L. (2004) A new classification scheme of the genetic code. J. Mol. Evol., 59, 598 -605 .Woese, C. R. (1967) The genetic code: The molecular basis for Genetic Expression. Harper & Row, New York.

The point symmetry corresponds to Halitsky’s family – nonfamily symmetry operation (“E-M bifurcation”, Halitsky 2003). For instance, the family codon GUA (Val) is mapped into the nonfamily codon UGC (Cys). Thus, this point symmetry is behind the family – nonfamily symmetry in our scheme (shaded vs. unshaded regions).

Interestingly, in 5 different mitochondrial codes (including yeast) AU(A/G) always encodes for Met and UG(A/G) encodes Trp. Thus, 32 positions code for 32 entries and each row contains exactly four different amino acids. In this spirit the mitochondrial code is the most regular one.

3. There are some anticodons, where the reverse anticodon has no tRNA, e.g. all 104 studied organisms have at least one tRNA for Tyr with anticodon GTA (some organisms, for instance H.sapiens, have different tRNAs with the same anticodon, altogether there are 189 different tRNAs with the anticodon GTA in the 104 species), but no organism has a tRNA for His with anticodon ATG. Table 2 lists different codon - reverse codon pairs. In the whole superkingdom bacteria the number of tRNA genes for the reverse part of table 2 is always zero, the few exceptions are only in eukaryotes and archaea. Another unique property of all bacteria is that there is no tRNA with anticodons for the self reverse codons UUU (Phe), UCU (Ser), UGU (Cys) and UAU (Tyr). Generally, table 2 shows that anticodons with an A at the third position are strongly preferred, whereas A** anticodons are significantly suppressed.

The arrows represent “codon - reverse codons” pairs.

Table 2. The amino acid codon- reverse codon pairs and the number of tRNA genes for the corresponding anticodons.