Nomenclature for cloned plant genes

5
Cis and Trans 99 7 ~ International Conference on Plant Pathogenic Bacteria, 11 -- 16 June 1989, Budapest, Hungary ($1500). Applications for meeting support may be made to the ISPMB office in Athens, Georgia, USA. Nomenclature for Cloned Plant Genes N omenclature! There are few topics that can excite the same fever- pitch of excitement among red-blooded molecular biologists as nomenclature! Your eyes glaze over, the jaw grows slack. Surely an editor who is under the gun to keep the size of the journal from outgrowing its budget could find a better use for space than writing about nomenclature! Call me an obsessed prophet of doom, but I think that, along with global warming and state terrorism, the present anarchy in the nomenclature of plant genes could spawn an au thentic crisis by the end of this century. Specifically, dear colleagues, we need to adopt a common system for naming cloned plant genes. The problem is that plant genes are being isolated, cloned, and sequenced at an exponentially increasing rate. With the development of methods for automated sequencing, the rate of sequencing, especially of agriculturally important species, will increase even more rapidly. The sequences are deposited in the EMBL Data Library or GenBank (which regularly update one another), where we can compare them with se- quences coming from our laboratories. Homology (a less presumptive term is similarity) with a previously reported sequence may be very in- formative, but we need particularly to know what the gene is and what it codes for. Higher plants are thought to contain more than 10,000 genes. (Plant molecular biologists are also concerned with lower plants, cyanobacteria, and other prokaryotes, but let us restrict ourselves for the moment to higher plants.) Given that all higher plants appear to have evolved from common ancestors, we can expect that many genes in different plants are truly homologous in the evolutionary sense of that term and that many genes in different plants will, therefore, have similar sequences. This is clearly the case among seed storage proteins, where widely divergent plants contain very similar vicilin-like and legumin-like proteins, al- though the names given to these proteins are all different. Finding one's

Transcript of Nomenclature for cloned plant genes

Page 1: Nomenclature for cloned plant genes

Cis and Trans 99

7 ~ In te r n a t i o n a l C o n f e r e n c e on P lant P a t h o g e n i c Bacteria, 11 - - 16 June

1989, Budapest, Hungary ($1500).

Applications for meeting support may be made to the ISPMB office in Athens, Georgia, USA.

Nomenclature for Cloned Plant Genes

N omenclature! There are few topics that can excite the same fever- pitch of excitement among red-blooded molecular biologists as nomenclature! Your eyes glaze over, the jaw grows slack. Surely

an editor who is under the gun to keep the size of the journal from outgrowing its budget could find a better use for space than writing about nomenclature! Call me an obsessed prophet of doom, but I think that, along with global warming and state terrorism, the present anarchy in the nomenclature of plant genes could spawn an au thentic crisis by the end of this century. Specifically, dear colleagues, we need to adopt a common system for naming cloned plant genes.

The problem is that plant genes are being isolated, cloned, and sequenced at an exponentially increasing rate. With the development of methods for automated sequencing, the rate of sequencing, especially of agriculturally important species, will increase even more rapidly. The sequences are deposited in the EMBL Data Library or GenBank (which regularly update one another), where we can compare them with se- quences coming from our laboratories. Homology (a less presumptive term is similarity) with a previously reported sequence may be very in- formative, but we need particularly to know what the gene is and what it codes for.

Higher plants are thought to contain more than 10,000 genes. (Plant molecular biologists are also concerned with lower plants, cyanobacteria, and other prokaryotes, but let us restrict ourselves for the moment to higher plants.) Given that all higher plants appear to have evolved from common ancestors, we can expect that many genes in different plants are truly homologous in the evolutionary sense of that term and that many genes in different plants will, therefore, have similar sequences. This is clearly the case among seed storage proteins, where widely divergent plants contain very similar vicilin-like and legumin-like proteins, al- though the names given to these proteins are all different. Finding one's

Page 2: Nomenclature for cloned plant genes

100 Cis and Trans

way among the cloned genes for these proteins would clearly be simpli- fied if we could agree that homologous sequences deserve common designations.

T h e naming of the f e w Until a few years ago, genes were quite properly the province of geneti- cists. Theidentification of a gene required the identification of a heritable character; a genetic locus required the segregation of that character from other known characters. Among higher plants, maize has been the best studied. By 1987 about 650 loci had been mapped in the nuclear genome of maize; this represents perhaps 5 percent of the total number of genes. Almost a fifth of the mapped loci affect the electrophoretic mobility of an enzyme (isozyme characters) and could, therefore, be structural genes for those proteins. A smaller number affect other biochemical characters, such as anthocyanin accumulation. Most of the remainder are morpho- logical characters.

In the early days of maize genetics it was quite acceptable to identify a gene by one or two letters: B designated a character in which anthocy- anin accumulated in major tissues; bm, was brown midrib; d was dwarf; E affected esterase; 0, opaque endosperm; su, sugary. Lower case referred to a recessive gene, upper case to a dominant gene. When additional loci that produced the same phenotype were discovered, a number was added: d3, E8, o2, sul, etc. But one or two letters per gene limits one to 26 or 676 genes, respectively, so that a system in which genes are represented by one or two letters will run out of symbols. More recently maize geneticists agreed to use three letters (= 17,576 symbols), but the earlier designations were allowed to stand. Only a handful of the genetic loci in maize has been isolated from genomic libraries and sequenced.

B r a v e n e w w o r l d Plastid genomes of higher plan ts contain about 85 genes of which 75 have been identified, all by molecular methods. There is an obvious explana- tion for the dramatic discrepancy between the fractions of nuclear and plastid genes which have been identified: plastid genomes are small enough to be totally mapped and cloned; plastid DNA from three species has been totally sequenced, whereas nuclear genomes are still daunt- ingly large. It is evident nonetheless that, as mapping, cloning, and sequencing of DNA becomes more routine, the numbers of nuclear genes to be identified, mapped, and sequenced will increase rapidly. Within a

Page 3: Nomenclature for cloned plant genes

Nomenclature of Cloned Plant Genes 101

few years we can expect that the number of loci identified by molecular methods will outstrip those identified by traditional methods.

We can also expect a qualitative change in the kinds of genes to be identified. Most genetic traits identifiable by breeding correspond to form or color, are due to single genes, and mutations in the correspond- ing genes are usually non-lethal. Genes for housekeeping functions are usually excluded. In contrast the first genes to be isolated from higher plants code for super-abundant proteins, such as the chlorophyll a/b- binding protein and seed reserve proteins, and they typically occur in multi-gene families. Only recently has it become possible to isolate genes made famous in classical genetics, such as sugary and opaque-2. Unless qualitatively new techniques greatly simplify the task of isolating genes controlling growth and form, that distinguish a pea from a bean, that distinguish a prostrate from an upright yew, or that cause a flower to have three petals rather than five, genes for morphological characters may be the most difficult and among the last to be fished out of genomic libraries. Most of the vast numbers of plant genes to be isolated in the next five or ten years, therefore, will probably code for relatively abun- dant proteins.

Thus, a principal problem in organizing sequences of plant DNA in the immediate future will be in comparing sequences which are likely to code for proteins whose functions are similar among different plants and homologous in some instances to other kinds of organisms. That is, we are more likely to be contending with genes that are common amongst higher plants rather than those which distinguish individual species. We need therefore to develop a nomenclature for cloned plant genes that will assist in establishing affiliations rather than one that assumes every species to be distinct.

Problems with existing nomenclatures of plant genes The first problem with the genetic nomenclature of higher plants is that there are too many of them - - too many nomenclatures, not too many plants. Geneticists of each of the major crops developed their own nomenclature and, since tomato and maize could not be hybridized, there was no necessity to insure that the various nomenclatures were congruent. Even in closely related plants (e.g., tomato and pepper), genes affecting similar characters are given different names. A second problem is one of sheer numbers: although the 17,576 symbols afforded by permutations among three letters would seem at first glance to be large enough to provide three-letter mnemonics for a large number of

Page 4: Nomenclature for cloned plant genes

102 Cis and Trans

gene products or phenotypes (e.g., Adh = alcohol dehydrogenase, hcf = high chlorophyll fluorescence), the system does not provide a ready means of relating genes for different enzymes along a biosynthetic pathway or of subunits of a multimeric structure (e.g., enzymes of carotenoid biosynthesis or ribosomal proteins). It also does not provide a ready means of relating regulatory, structural, and transport genes affecting a single protein or process. (Adr is a maize locus that regulates the expression of Adh.) In effect, the three-letter mnemonic is shrunk to two letters. In some nomenclatures related genes are distinguished with superscripts or subscripts. These modifiers can introduce serious ambi- guity in a computer search.

Nomenclature of bacterial genes In contrast to the systems of gene nomenclature developed for crop plants, the bacterial system is widely used, is free of some of the limitations and ambiguities described above (although it has some of its own), and is one with which molecular geneticists are familiar. It has been adopted with modifications by researchers working with cyano- bacteria (cf. Tandeau de Marsac & Houmard, 1987), chloroplasts (Hallick & Bottomley, 1983), and plant mitochondria (Lonsdale, 1988). For readers having only a casual familiarity with the bacterial system, a brief review might be helpful:

The current system of nomenclature for bacterial genetics was proposed by Demerec et al. (1966). The set of genes whose mutants produce a given phenotypic effect are designated by a three-letter, lower-case, italicized symbol; e.g., genes affecting glutamine metabolism are designated gln. Different members of a set are distinguished by adding an italicized capital letter, e.g., glnA, glnF, lacZ, polC, phoR, Different alleles at a single locus are designated by a number: glnA4, glnA7.

A major limitation of the bacterial system of nomenclature is the short shrift given to multi-gene families, which are such a familiar feature of plant genomes. Ribosomal RNA cistrons are the only common instance where multi-gene families occur in bacteria, so that it does not cause unacceptable confusion to designate multiple copies of these genes as rrnA, rrnB, rrnC, etc.even though the parallel symbols rpoA, rpoB, and rpoC designate genes that code for distinct subunits of RNA polymerase.

Free deposit, high interest, unlimited borrowing Although we have come to rely on the EMBL Data Bank and GenBank for identifying homologous sequences, these admirable institutions do not solve the problem of gene nomenclature for us. Entry names contain an

Page 5: Nomenclature for cloned plant genes

Nomenclature of Cloned Plant Genes 103

abbrevia t ion of the gene name as suppl ied by the contr ibutor , bu t wou ld a naive reader recognize that the fol lowing entries

Athlpcpl Arabidopsis thaliana gene (LHCP AB 165) for chlorophyll a/b binding protein. 6/87 999bp

Lgiab19 Lemna gibba chlorophyll a /b apoprotein gene, complete cds. 9/86 1913bp

refer to the same gene? In this instance confusion arises because of ambigui ty in the naming of a protein. It m a y be unrealistic to expect biochemists to agree among themselves, bu t we can hope that plant molecular biologists, for w h o m the stakes are higher, can be pe r suaded to adop t a c o m m o n system.

Recommendations If you have fol lowed the a rguments this far, you m a y be recept ive to a proposal , that the I S P MB sponsor the deveIopment of a nomenclature for cloned plant genes. I think the s t ructure of the nomencla ture should be large enough to a ccommoda t e at least 10,000 names, incorporate s imple mnemon ic codes, lend itself to compute r i zed searches, reflect c o m m o n ancest ry a n d / o r function, and be compat ible as far as possible with existing systems, especially those for bacteria and maize. There are m a n y general quest ions to be resolved, such as whether plant names should be inc luded in the gene name and, if so how. Many specific quest ions will need to be addressed by workers direct ly involved in different areas of p lant molecular biology.

The Reporter will serve as a clearing house and will act ively solicit p roposa ls and criticisms. I think it is reasonable to expect that a sys tem could be sufficiently advanced by 1992 to be ratified at the Third Con- gress of the ISPMB. We shall look forward to hear ing f rom you. f2

- - C . A . P .

References

Demerec, M., E.A. Adelberg, A.J. Clark, and P.E. Har tman. 1966. A proposal for a uniform nomenclature in bacterial genetics. Genetics 54:61-76.

Hallick, R.B., and W. Bottomley. 1983. Proposals for the naming of chloroplast genes. Plant Mol~ Biol. Rep. 4:38-43.

Lonsdale, D.M., and C.J. Leaver. 1988. Mitochondrialgene nomenclature. Plant Mol. BioL Rep. 7(2):14-21.

Maize Genetics CooperationNews Letter. 1987. Vol. 61. Dept. Agron. and U.S.D.A., Univ. Missouri, Columbia, Missouri.

Tandeau de Marsac, N., and J. Houmard. 1987. Advances in cyanobacterial molecular genetics, in P. Fay and C. Van Baalen, eds. The Cyanobacteria. Elsevier, Amsterdam.