Identification of Two Genes Immediately Downstream …tion with S1 nuclease (0.3...

9
Vol. 152, No. 3 JOURNAL OF BACTERIOLOGY, Dec. 1982, p. 1211-1219 0021-9193/82/121211-09$02.00/0 Copyright C) 1982, American Society for Microbiology Identification of Two Genes Immediately Downstream from the polA Gene of Escherichia coli CATHERINE M. JOYCE* AND NIGEL D. F. GRINDLEY Department of Molecular Biophysics and Biochemistry, Yale University Medical School, New Haven, Connecticut 06510 Received 30 April 1982/Accepted 11 August 1982 We have identified two genes within a 1-kilobase region immediately following the polA gene of Escherichia coli. The first, whose transcription is initiated about 150 base pairs beyond the end of the polA coding sequence, is the gene corresponding to the previously sequenced "spot 42 RNA" (B. G. Sahagan and J. E. Dahlberg, J. Mol. Biol. 131:573-592, 1979). The second, located further downstream and transcribed towards polA, is the structural gene for a 22- kilodalton polypeptide, which we have detected by using plasmid-directed protein synthesis in maxicells. Sequence analysis of this region of the E. coli genome suggests that it contains little, if any, redundant DNA. During our sequencing studies of the polA region of Escherichia coli (15), we discovered a sequence starting about 150 nucleotides beyond the end of the polA coding region that corre- sponds to that of the previously characterized "spot 42 RNA" (25). At present, very little is known about the function of spot 42 RNA. Sahagan and Dahlberg (26) have shown that it is both more abundant and more stable than bacte- rial mRNAs, though less so than rRNA or tRNA. Cell fractionation studies (26) indicate that spot 42 RNA is associated both with ribo- somes and with the bacterial nucleoid. Produc- tion of spot 42 RNA appears to be negatively regulated by cyclic AMP and unaffected by stringent control. Analysis of the 109-nucleotide spot 42 RNA synthesized in vivo shows that it is a primary transcript (25), and therefore its expression must be independent of the preced- ing polA gene. Examination of the spot 42 RNA sequence indicates that it has the coding capacity for a 15- residue peptide, with the AUG initiator triplet preceded by a polypurine tract that could serve as a ribosome binding site (30). Translation of this RNA would give a product rich in hydro- phobic amino acids, particularly leucine (see sequence in Fig. 1). Sahagan and Dahlberg (25, 26) noted the analogy between spot 42 RNA and the leader RNAs of biosynthetic operons regu- lated by attenuation (38) and suggested that spot 42 RNA might be a leader transcript for an operon involved in the metabolism of hydropho- bic amino acids. In this paper we present the data that led to our identification of the spot 42 gene. (Hybrid- ization of labeled spot 42 RNA to restriction endonuclease digests of E. coli DNA [23, 26] indicates that this must be the only chromo- somal location of the spot 42 gene.) We also describe DNA sequence data from the region downstream of spot 42, which suggests that spot 42 RNA does not function as a leader RNA. MATERIALS AND METHODS Abbreviations. The following abbreviations are used: bp, base pair(s); kb, kilobase pair(s); kd, kilodal- tons. Suppliers. Restriction endonucleases were obtained from New England Biolabs or Bethesda Research Laboratories and were used as recommended by the supplier. Si nuclease was from Sigma Chemical Co. and T4 DNA ligase was from Boehringer-Mannheim. E. coli RNA polymerase, purified according to Bur- gess and Jendrisak (5), was the kind gift of Terry Platt. Construction of plasmids. Plasmid pCJ1 (described in reference 15) contains a 5-kb HindlIl fragment carrying the E. coli polA gene (polAl allele) inserted at the HindIll site of the pBR322-derived vector, pNG16 (10). Plasmid pCJ4 was derived from pCJ1 by joining the BglII site just upstream of the polA structural gene to the BamHI site of the vector (Fig. 2). The construc- tion of pCJ40 is described in Fig. 2 and in Results. Ligation reactions were carried out at low DNA con- centration (<2 pmol of ends per 100-,ul reaction) to favor unimolecular ligation events. Single-stranded ends of restriction fragments were removed by diges- tion with S1 nuclease (0.3 U/,ug of DNA) for 30 min at 37°C in a reaction mixture containing 30 mM sodium acetate (pH 4.5), 250 mM sodium chloride, 1 mM zinc chloride, and about 0.15 mg of DNA per ml. The large 7.2-kb fragment from pCJ4--yb2 was purified by elec- trophoresis on a 1% agarose gel in 100 mM Tris-borate (pH 8.3)-i mM EDTA and isolated by electroelution followed by chromatography on a 50- to 100-pJ column of DEAE-cellulose. Recombinant plasmids were iso- 1211 on May 10, 2020 by guest http://jb.asm.org/ Downloaded from

Transcript of Identification of Two Genes Immediately Downstream …tion with S1 nuclease (0.3...

Vol. 152, No. 3JOURNAL OF BACTERIOLOGY, Dec. 1982, p. 1211-12190021-9193/82/121211-09$02.00/0Copyright C) 1982, American Society for Microbiology

Identification of Two Genes Immediately Downstream fromthe polA Gene of Escherichia coliCATHERINE M. JOYCE* AND NIGEL D. F. GRINDLEY

Department of Molecular Biophysics and Biochemistry, Yale University Medical School, New Haven,Connecticut 06510

Received 30 April 1982/Accepted 11 August 1982

We have identified two genes within a 1-kilobase region immediately followingthe polA gene of Escherichia coli. The first, whose transcription is initiated about150 base pairs beyond the end of the polA coding sequence, is the genecorresponding to the previously sequenced "spot 42 RNA" (B. G. Sahagan andJ. E. Dahlberg, J. Mol. Biol. 131:573-592, 1979). The second, located furtherdownstream and transcribed towards polA, is the structural gene for a 22-kilodalton polypeptide, which we have detected by using plasmid-directed proteinsynthesis in maxicells. Sequence analysis of this region of the E. coli genomesuggests that it contains little, if any, redundant DNA.

During our sequencing studies of the polAregion of Escherichia coli (15), we discovered asequence starting about 150 nucleotides beyondthe end of the polA coding region that corre-sponds to that of the previously characterized"spot 42 RNA" (25). At present, very little isknown about the function of spot 42 RNA.Sahagan and Dahlberg (26) have shown that it isboth more abundant and more stable than bacte-rial mRNAs, though less so than rRNA ortRNA. Cell fractionation studies (26) indicatethat spot 42 RNA is associated both with ribo-somes and with the bacterial nucleoid. Produc-tion of spot 42 RNA appears to be negativelyregulated by cyclic AMP and unaffected bystringent control. Analysis of the 109-nucleotidespot 42 RNA synthesized in vivo shows that it isa primary transcript (25), and therefore itsexpression must be independent of the preced-ing polA gene.

Examination of the spot 42 RNA sequenceindicates that it has the coding capacity for a 15-residue peptide, with the AUG initiator tripletpreceded by a polypurine tract that could serveas a ribosome binding site (30). Translation ofthis RNA would give a product rich in hydro-phobic amino acids, particularly leucine (seesequence in Fig. 1). Sahagan and Dahlberg (25,26) noted the analogy between spot 42 RNA andthe leader RNAs of biosynthetic operons regu-lated by attenuation (38) and suggested that spot42 RNA might be a leader transcript for anoperon involved in the metabolism of hydropho-bic amino acids.

In this paper we present the data that led toour identification of the spot 42 gene. (Hybrid-ization of labeled spot 42 RNA to restriction

endonuclease digests of E. coli DNA [23, 26]indicates that this must be the only chromo-somal location of the spot 42 gene.) We alsodescribe DNA sequence data from the regiondownstream of spot 42, which suggests that spot42 RNA does not function as a leader RNA.

MATERIALS AND METHODS

Abbreviations. The following abbreviations areused: bp, base pair(s); kb, kilobase pair(s); kd, kilodal-tons.

Suppliers. Restriction endonucleases were obtainedfrom New England Biolabs or Bethesda ResearchLaboratories and were used as recommended by thesupplier. Si nuclease was from Sigma Chemical Co.and T4 DNA ligase was from Boehringer-Mannheim.E. coli RNA polymerase, purified according to Bur-gess and Jendrisak (5), was the kind gift of Terry Platt.

Construction of plasmids. Plasmid pCJ1 (describedin reference 15) contains a 5-kb HindlIl fragmentcarrying the E. coli polA gene (polAl allele) inserted atthe HindIll site of the pBR322-derived vector, pNG16(10). Plasmid pCJ4 was derived from pCJ1 by joiningthe BglII site just upstream of the polA structural geneto the BamHI site of the vector (Fig. 2). The construc-tion of pCJ40 is described in Fig. 2 and in Results.Ligation reactions were carried out at low DNA con-centration (<2 pmol of ends per 100-,ul reaction) tofavor unimolecular ligation events. Single-strandedends of restriction fragments were removed by diges-tion with S1 nuclease (0.3 U/,ug of DNA) for 30 min at37°C in a reaction mixture containing 30 mM sodiumacetate (pH 4.5), 250 mM sodium chloride, 1 mM zincchloride, and about 0.15 mg of DNA per ml. The large7.2-kb fragment from pCJ4--yb2 was purified by elec-trophoresis on a 1% agarose gel in 100 mM Tris-borate(pH 8.3)-i mM EDTA and isolated by electroelutionfollowed by chromatography on a 50- to 100-pJ columnof DEAE-cellulose. Recombinant plasmids were iso-

1211

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

1212 JOYCE AND GRINDLEY J. BACTERIOL.

ValGlySerGlyGluAsnTrpAspGlnAlaHisGTGGGGAGTGGCGAAAACTGGGATCAGGCGCACTAAGATTCGCCTGAACATGCCTTTTTTCGTAAGTAAGCAACATAAGCTGTCACGTT 2840CACCCCTCACCGCTTTTGACCCTAGTCCGCGTGATTCTAAGCGGACTTGTACGGAAAAAAGCATTCATTCGTTGTATTCGACAGTGCAA

TTGTGATGGCTATTAGAAATTCCTATGCAACAACTGAAAAAAAATTACAAAAAGTGCTTTCTGAACTGAACAAAAAAGAGTAAAGTTAGT 2930AACACTACCGATAATCTTTAAGGATACGTTGTTGACTTTTTTTTAATGTTTTTCACGAAAGACTTGACTTGTTTTTTCTCATTTCAATCA

MetPheTyrLeuSerAspLeuLeuLeuHisValIleGlyPheGlyCGCGTAGGGTACAGAGGTAAGATGTTCTATCTTTCAGACCTTTTACTTCACGTAATCGGATTTGGCTGAATATTTTAGCCGCCCCAGTCA 3020GCGCATCCCATGTCTCCATTCTACAAGATAGAAAGTCTGGAAAATGAAGTGCATTAGCCTAAACCGACTTATAAAATCGGCGGGGTCAGT

GTAATGACTGGGGCGTTTTTTATTGGGCGAAAGAAAAGATCCGTAATGCCTGATGCGCTATGTTTATCAGGCCAACGGTAGAATTGTAAT 3110CATTACTGACCCCGCAAAAAATAACCCGCTTTCTTTTCTAGGCATTACGGACTACGCGATACAAATAGTCCGGTTGCCATCTTAACATTA

CTATTGAATTTACGGGCCGGATACGCCACATCCGGCACAAGCATTAAGGCAAGAAAATTATTCGCCGTCCTGCGTTTCTTCTACAGGCTG 3200GATAACTTAAATGCCCGGCCTATGCGGTGTAGGCCGTGTTCGTAATTCCGTTCTTTTAATAAGCGGCAGGACGCAAAGAAGATGTCCGAC

GluGlyAspGlnThrGluGluValProGln

CATCTCGCTAAACCAGGTATCCAGTTTCTGCCGCAGCTTGTCCACGCCTTGTTTCTTCAACGAAGAAAACGTTTCAACCTGCACATCACC 3290GTAGAGCGATTTGGTCCATAGGTCAAAGACGGCGTCGAACAGGTGCGGAACAAAGAAGTTGCTTCTTTTGCAAAGTTGGACGTGTAGTGGMetGluSerPheTrpThrAspLeuLysGlnArgLeuLysAspValGlyGlnLysLysLeuSerSerPheThrGluValGlnValAspGly

GTTAAACGCCAGTACAGCTTCACGCACCATATTCAATTGCGCTTTACGTGCGCCGCTTGCCAGTTTGTCCGCTTTGGTCAGCAGCACCAG 3380CAATTTGCGGTCATGTCGAAGTGCGTGGTATAAGTTAACGCGAAATGCACGCGGCGAACGGTCAAACAGGCGAAACCAGTCGTCGTGGTCAsnPheAlaLeuValAlaGluArgValMetAsnLeuGlnAlaLysArgAlaGlySerAlaLeuLysAspAlaLysThrLeuLeuValLeu

AACGGCGATATTGCTGTCTACCGCCCACTCAATCATCTGCTGATCCAAATCTTTCAGCGGATGGCGAATATCCATTAGCACCACCAGACC 3470TTGCCGCTATAACGACAGATGGCGGGTGAGTTAGTAGACGACTAGGTTTAGAAAGTCGCCTACCGCTTATAGGTAATCGTGGTGGTCTGGValAlaIleAsnSerAspValAlaTrpGluIleMetGlnGlnAspLeuAspLysLeuProHisArgIleAspMetLeuValValLeuGly

TTGCAGGCTCTGACGTTTTTCGAGGTATTCGCCGAGCGCACGCTGCCATTTGCGCTTCATCTCTTCCGGGACTTCCGCATAACCGTACCC 3560AACGTCCGAGACTGCAAAAAGCTCCATAAGCGGCTCGCGTGCGACGGTAAACGCGAAGTAGAGAAGGCCCTGAAGGCGTATTGGCATGGGGlnLeuSerGlnArgLysGluLeuTyrGluGlyLeuAlaArgGlnTrpLysArgLysMetGluGluProValGluAlaTyrGlyTyrGly

AGGCAAGTCAACCAGACGCTTGCCGTCAGCCACTTCAAACAGGTTGATAAGCTGGGTGCGCCCTGGGGTTTTTGAGGTACGAGCCAGGCT 3650TCCGTTCAGTTGGTCTGCGAACGGCAGTCGGTGAAGTTTGTCCAACTATTCGACCCACGCGGGACCCCAAAAACTCCATGCTCGGTCCGAProLeuAspValLeuArgLysGlyAspAlaValGluPheLeuAsnIleLeuGlnThrArgGlyProThrLysSerThrArgAlaLeuSer

TTTCTGGTTAGTCAGCGTGTTCAGCGCGCTGGATTTACCTGCGTTGGAACGGCCTGCAAAAGCCACTTCAATTCCGGTATCGGAAGGTAG 3740AAAGACCAATCAGTCGCACAAGTCGCGCGACCTAAATGGACGCAACCTTGCCGGACGTTTTCGGTGAAGTTAAGGCCATAGCCTTCCATCLysGlnAsnThrLeuThrAsnLeuAlaSerSerLysGlyAlaAsnSerArgGlyAlaPheAlaValGluIleGlyThrAspSerProLeu

GTGGCGAATATCAGGCGCACTCATCACAAAATGCGTCTGTTGATAATTCAAATTAGTCAAAGCGGTCGTCTCCGTCAGTCAAAGCTT 3827CACCGCTTATAGTCCGCGTGAGTAGTGTTTTACGCAGACAACTATTAAGTTTAATCAGTTTCGCCAGCAGAGGCAGTCAGTTTCGAAHisArgIleAspProAlaSerMet

FIG. 1. DNA sequence downstream from the polA gene. The spot 42 RNA transcript is indicated by the linefrom nucleotides 2,934 to 3,042. Amino acid sequences are shown for the C-terminal portion ofpolA (encoded bythe top strand), the potential translation product of spot 42 RNA (top strand), and the 22-kd reading frame(bottom strand). The numbering corresponds to that of our previously published polA sequence (15).

lated from 1.5-ml overnight cultures by the alkaline transposable element -yb (12). MG1063 containinglysis method (3) and screened by digestion with appro- pCJ4 was used as the donor in a mating with S165 (F-priate restriction enzymes. his /gaIS165 rpsL). Approximately equal quantities of

Isolation and characterization of yB insertions in donor and recipient cells (at 2 x 108 to 5 x 108 cells perplasmid pCJ4. MG1063 (FE recA56) served as the ml) were mixed and incubated at 37°C for 3 to 4 h. Thesource of F carrying an active copy of the 5.7-kb cells were harvested by centrifugation, washed once to

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

GENES IN THE polA REGION OF E. COLI 1213

remove extracellular ,B-lactamase, and plated on Mac-Conkey-galactose plates containing carbenicillin (250p.g/ml) and streptomycin (250 ,ug/ml). Plasmid DNAfrom individual transconjugants was analyzed by di-gestion with HaeIll to identify those in which -yb hadinserted into the 576-bp HaellI fragment (3,127 to3,702) that almost exactly coincides with the 22-kdreading frame (Fig. 3). Since there are HaeIII sites 12bp from either end of yyb (22), we were able, bymeasuring the size of the polA-yyb junction fragments,to determine the two possible (symmetrically related)sites of each insertion. We subsequently used anHpaII digest to distinguish between the two possiblelocations in each case.

In vitro transcription and analysis of transcripts. The650-bp HpaIl fragment from pCJ1 (see Fig. 3B) waspurified by electrophoresis on a 5% polyacrylamide gel(30:1 acrylamide-bis) in 50 mM Tris-borate (pH 8.3)-0.5 mM EDTA. DNA was eluted from the crushed gelslice by soaking in high-salt buffer (19) and wasextracted with phenol before use. In vitro transcrip-tion reactions, using this fragment as template, werecarried out and the products were analyzed as de-scribed by Wu et al. (37). RNase T, fingerprint analy-sis of transcripts was performed as described bySquires et al. (33).

DNA sequencing. Labeled DNA fragments wereisolated from plasmid pCJ1 and sequenced by themethod of Maxam and Gilbert (19), as previouslydescribed (15).

Plasmid-directed protein synthesis in maxicells.CSR603 (uvrA6 recAI phr-1) was transformed with theappropriate plasmids. Cells were grown, UV irradiat-ed, and labeled with [355]methionine as described bySancar et al. (29). The protein products were analyzedby sodium dodecyl sulfate-polyacrylamide gel electro-phoresis (17).

In vivo assay for promoter sequences. The proce-dures developed by McKenney et al. (20) for cloningpromoter fragments and assaying their function werefollowed. Plasmid pKO-1 and the appropriate gal hoststrains were kindly provided by K. McKenney and M.Rosenberg (National Institutes of Health, Bethesda,Md.). The 260-bp HindIII-HincII fragment containingthe start of the 22-kd coding region (see Fig. 3B) wasinserted between the HindIII and Smal sites of thepromoter-cloning vector pKO-1. Plasmid clones wereanalyzed by digestion with HaeIII and HpaIl, both ofwhich cut within the cloned fragment. Four indepen-dently isolated plasmids with the desired structurewere introduced into N100 (galK recA), and the result-ing transformants were examined qualitatively on

ApR

1. BamHI+BglU

2 T4 DNA ligase

Bam HI

7; insertion

ApR

1. BamHI+HindJm2. SI nuclease

3. Isolate 7.2 KbHindm - Bam HIfragment

4. T4 DNA ligase

FIG. 2. Construction of plasmid pCJ40, in which the start of the 22-kd reading frame is deleted. (Thin line)pNG16 vector sequences; (heavy line) E. coli DNA containing the polA region; (hatched region) 22-kd readingframe. DNA of the transposable element PyB is represented by the double line (not to scale). R = EcoRI; H =HindlIl. Details of the construction are given in the text. The vector region of these plasmids codes for thesynthesis of 1-lactamase (30 kd), the product of the ApR determinant. The cloned region contains the polAlamber allele which produces an amber fragment of DNA polymerase I (36 kd) as well as the intact enzyme (103kd) as a readthrough product. These products are noted in Fig. 5.

VOL. 152, 1982

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

1214 JOYCE AND GRINDLEY

End ofpol A

2700

J. BACTERIOL.

Spot 42RNA

3000 3500 3827

A. HinfI ISau3AI -1,0HaemlE IHpaIf < t 1Hind In<lHincaI

B.650 b.p.

Transcription TemplateCloned inpKO-l

C. Open reading frames

FIG. 3. Sequence organization of the region downstream from the polA gene. (A) The vertical bars indicaterestriction sites used in this work. The horizontal arrows show the extent of DNA sequence determined fromeach 5'-end-labeled fragment. (B) Restriction fragments used in other experiments described in the text. The 650-bp HpaII fragment (2,477 to 3,128 on the published polA sequence [15]) was used as a template for in vitrotranscription. The 260-bp HinclI-HindIll fragment (3,569 to 3,826) was cloned in the vector pKO-1 to assay forpromoter function. (C) Open reading frames in the DNA sequence downstream from the spot 42 RNA gene. Onlythose that have an initiation codon and could give rise to a product longer than 40 amino acids are shown.

MacConkey-galactose plates and quantitatively by as-saying galactokinase activity in a crude cell lysate.

RESULTSCharacterization of in vitro transcripts. Our

attention was initially directed to the regionfollowing the polA gene when we noticed asequence typical of a rho-independent termina-tor (21), a guanine-cytosine-rich region of dyadsymmetry, followed on one strand by a run ofT's (nucleotides 3,010 to 3,044, Fig. 1). Althoughthe orientation of this sequence was such that itcould serve as a transcription terminator for thepolA gene, we doubted that this was the casesince the sequence was more than 200 nucleo-tides beyond the end of the polA coding se-quence. In vitro transcription of an HpaII re-striction fragment covering this region (see Fig.3B) gave a good yield of a labeled RNA about100 nucleotides in length (Fig. 4).Our first two-dimensional fingerprint analysis

of a T1 RNase digest of the in vitro transcriptindicated that it was initiated and terminatedinternally on the template, with initiation takingplace between nucleotides 2,926 and 2,934 andtermination occurring between nucleotides 3,036

and 3,045 (Fig. 1). Closer examination of therelevant DNA sequence revealed that it corre-sponded to the sequence of spot 42 RNA (25)over the region 2,934 to 3,042. Additional finger-printing experiments confirmed that our in vitrotranscript matched the spot 42 RNA sequenceindicated in Fig. 1. We observed the predictedchanges in the two-dimensional pattern depend-ing on whether the in vitro transcript was labeledwith [ct-32P]GTP, -ATP, or -UTP. Moreover, weconfirmed our identification of certain oligonu-cleotides by complete alkaline hydrolysis and bysecondary digestion with RNase A.Although we have not attempted to make

quantitative comparisons of the strength of thespot 42 promoter, our experiments suggest thatit is a strong promoter, since we always obtain ahigh yield of the spot 42 transcript in an in vitroreaction, even in mixing experiments with frag-ments containing the E. coli trp promoter (whichis similar in strength to lacUV5; H. Horowitzand T. Platt, personal communication). Tran-scription of spot 42 RNA in vitro is salt resistant,with an optimum at 150 mM KCI. By fingerprintanalysis we have identified the transcript (about190 nucleotides) that results from readthrough of

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

GENES IN THE polA REGION OF E. COLI 1215

- Top

- 650

-250

- Readthrough

Spot 42

FIG. 4. In vitro transcription products from the650-bp HpaII fragment shown in Fig. 3. The reactioncontained 150 mM KCI, the optimum for spot 42transcription. The spot 42 transcript and the productresulting from readthrough at the spot 42 terminatorare indicated. The positions of marker transcripts, 250and 650 nucleotides in length, are also shown.

the spot 42 terminator to the end of the HpaIItemplate (Fig. 4). Measurement of the radioac-tivity incorporated into each transcript (withcorrections for the length and composition of

each) indicates that the spot 42 terminator is99.4% efficient in vitro in the absence of rho.The high efficiency of this terminator is notsurprising since its sequence corresponds to anear-perfect example of a prototype rho-inde-pendent terminator (21).DNA sequencing. The sequence of spot 42

RNA suggested that it could be a leader tran-script responsible for the regulation of one ormore downstream genes. We therefore se-quenced the DNA from the end of the polA geneas far as the HindIII site used in constructing ourpolA clones, in the expectation of finding such adownstream gene (sequence shown in Fig. 1).Figure 3A details our sequencing strategy andthe extent of sequences determined from eachrestriction fragment. Although there are threeregions where the sequence was determined ononly one strand of the DNA (nucleotides 3,030to 3,060, 3,090 to 3,280, and 3,390 to 3,540),most of this sequence was obtained from severalindependent restriction fragments, so that wedid not have to rely on reading sequences a longdistance from the labeled end. We are thereforeconfident that we have ruled out the possibilityof errors that could mislead us in the identifica-tion of potential translational reading frames.As discussed below, the only plausible coding

region within this DNA sequence reads towardthe spot 42 locus. Translation of this readingframe would give a polypeptide of 198 aminoacids (22 kd), whose sequence is shown in Fig.1. Since the existence of a gene downstreamfrom, but reading towards, spot 42 would arguemost strongly against a role for spot 42 RNA as aleader transcript, we undertook further experi-ments to establish whether this 22-kd readingframe was in fact translated.

Protein synthesis in maxicells. We used themaxicell technique of Sancar et al. (27, 29) toexamine plasmid-directed protein synthesis. Ouroriginal plasmid clone of the polA region con-tained a 5-kb HindIlI fragment of E. coli carry-ing the polAl amber mutation (15). To focus onthe region downstream of the polA gene, weconstructed a simpler plasmid in which all E.coliDNA upstream ofpolA had been eliminated.This plasmid, pCJ4, was obtained by joining theBglII site just before the start of the polAstructural gene to the BamHI site within thevector DNA (see Fig. 2). When we examined thelabeled proteins synthesized from this plasmid inmaxicells, we observed the expected bands ataround 30 kd (,-lactamase), 36 kd (polAl amberfragment), and 103 kd (DNA polymerase I,resulting from readthrough of the polAl ambermutation). (The identity of the 36-kd band wasestablished by observing a change in size when adifferent cloned amber allele of polA was used;the 103-kd band was identified with plasmids

VOL. 152, 1982

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

1216 JOYCE AND GRINDLEY

with yb insertions in the appropriate region ofthe polA structural gene.) In addition, therewere a number of faint bands in the 20- to 25-kdregion. Clearly, some or all of these bands couldhave resulted from endogenous background syn-thesis or degradation of larger products. Wetherefore felt that the best strategy for identify-ing a protein product of the 22-kd reading framewas to see whether one of these bands could beeliminated by making a lesion within the pre-sumed coding region.Our strategy was to obtain a derivative of

pCJ4 having an insertion of the transposableelement yb within the presumed 22-kd codingregion and then to use a suitable unique restric-tion site within yb to create a deletion. We madederivatives of pCJ4 carrying random insertionsof y8 by selecting for transfer of this normallynon-mobilizable plasmid mediated by F carryingan active -yb (12). Transfer of such plasmidspresumably occurs by the transient formation ofa cointegrate which, upon resolution, leaves acopy of -yb on the target plasmid. (The use of -ybinsertion mutants obtained in this way in theidentification of structural genes has been de-scribed in detail in references 28 and 29.)As described in Materials and Methods, we

identified by restriction mapping three plasmids(of 24 tested) in which -yb had inserted close tonucleotide 3,600 on our sequence, within thefirst one-third of the presumed 22-kd readingframe. One of the three (pCJ4-yb2) was orientedsuch that the so-called 8 end of the element wasclosest to the end of the polA gene. We made useof the unique BamHI site close to this end (12) todelete all of the DNA between this BamHI siteand the HindlIl site at the end of the polA clonedfragment. In this way we obtained plasmidpCJ40, which differs from pCJ4 in the absence ofthe region from about 3,620 to 3,827 covering thestart of the 22-kd reading frame and the additionof about 400 bp from the 8 end of yb. Details ofthis construction are shown in Fig. 2. When weexamined the proteins synthesized from pCJ4and pCJ40 in maxicells, we found that a bandwithin the predicted size range was producedfrom pCJ4 but not from pCJ40 (Fig. 5), suggest-ing that the 22-kd coding region we have identi-fied is indeed functional.

Location of the promoter for the 22-kd readingframe. Examination of the DNA sequence pre-ceding the start of the 22-kd reading framerevealed a region having some homology to theconsensus sequence for promoters recognizedby E. coli RNA polymerase (24). The sequenceTAATTTG (3,795 to 3,789) contains the mostimportant bases of the Pribnow box consensussequence, and TTGACT from 3,822 to 3,817provides an excellent -35 region, although the21-bp spacer between these two elements is

O (D OL. - It Krc -'3o z 0 cJo o. a. a.

- 103(pol I)

s-36(polAlomber)

Tet- W

Amp - _-ov-3

FIG. 5. Labeled products from maxicell proteinsynthesis displayed on a 12% sodium dodecyl sulfate-polyacrylamide gel (17). Plasmid pNG16 (10) is apBR322 derivative which comprises the vector portionof the plasmids used in this work and gives rise toprotein products derived from the ampicillin and tetra-cycline resistance determinants. Plasmids pCJ4 andpCJ40 are described in the text and in Fig. 2. Thecontrol lane was obtained by labeling CSR603 maxi-cells containing no plasmid. Protein sizes are in kilo-daltons; 14C-labeled methylated carbonic anhydrase(Amersham Corp.) was run as a marker (30 kd) in anadjacent slot (not shown). Arrow indicates the positionof the presumed product of the 22-kd reading frame.

substantially longer than the optimum 17 bp (34).An alternative but less good -35 sequence,CTGACG, is present at the optimum distancefrom the Pribnow box. We tried to determine

J. BACTERIOL.

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

GENES IN THE polA REGION OF E. COLI 1217

whether this region could function as a promoterby cloning the HindIll-HinclI fragment thatincludes the start of the 22-kd coding region (seeFig. 3B) into the pKO-1 plasmid of McKenney etal. (20). This system allows detection of promot-er sequences by the resulting expression of adownstream galK gene. The cloned HindIll-HincII fragment showed weak promoter activityas judged by the phenotype on MacConkey-galactose indicator plates of N100 (galK recA)carrying the appropriate plasmid. Quantitationof the galactokinase produced by these strainsshowed that the cloned fragment directed thesynthesis of about 13 U of galactokinase abovethe normal background (12 U) produced by thepKO-1 vector. This is about 1% of the activity ofthe induced lacUV5 promoter cloned in thesame vector. The weakness of the presumedpromoter for the 22-kd reading frame is consis-tent with the low level of expression of thecorresponding protein band in maxicells (Fig. 5).

DISCUSSIONSpot 42 promoter region. Location of the DNA

sequence corresponding to spot 42 RNA hasallowed us to identify the sequences controllinginitiation of this transcript. Surprisingly, in viewof the abundance of spot 42 RNA in vivo and thehigh yield obtained in our in vitro transcriptionreactions, the sequences preceding the tran-scription initiation site show rather poor homol-ogy with typical bacterial promoter sequences(24). The best match at the -10 region (pro-totype TATAATPu) is TAAAGTT, and it isdifficult to detect any convincing homology tothe -35 consensus sequence (TTGACA). Insome respects the spot 42 promoter shows themost similarity to the first promoters of rRNAoperons (8). The latter also have atypical -35regions, with the TTG but not the ACA of theconsensus sequence conserved, and, like thespot 42 promoter, they have long tracts ofA or Tresidues preceding the -35 region. The besthomology is between the spot 42 promoter andthe rrnD1 promoter (39) over the -40 to -70region (Fig. 6). The homology between thesetwo promoters is less pronounced between -40and the transcriptional initiation point. In partic-ular, the guanine-cytosine-rich consensus se-quence (-7 to -1 on the rrnD1 sequence) postu-lated to play a role in stringent response (35) isnot present in the spot 42 sequence, consistentwith the observed lack of effect of amino acidstarvation on spot 42 RNA synthesis (26).Other interesting features of the DNA se-

quence of the spot 42 promoter region are theimperfect dyad symmetry centered around -40and the tandemly repeated TGAAC between-20 and -30. These sequence patterns (shown

in Fig. 6) are possible candidates for regulatorysequences controlling spot 42 expression. Inexamining the DNA sequence preceding thespot 42 transcriptional start, one must bear inmind that the region between 2,788 and 2,834probably contains the polA terminator as well asthe spot 42 promoter and that some of thesequence features we have noted may be associ-ated with the former.Genes downstream from spot 42. The spot 42

RNA sequence (25) shows some of the featurestypical of the leader RNAs of biosynthetic oper-ons regulated by attenuation (reviewed in refer-ence 38). If spot 42 were a leader RNA, onewould predict, by analogy with earlier examples,that translation of the encoded peptide wouldmost probably sense the level of leucine in thecell and in certain circumstances mediatereadthrough of the terminator, allowing expres-sion of one or more downstream structuralgenes. We obtained DNA sequence data extend-ing for nearly 800 bp beyond the spot 42 termina-tor but could not find any reading frame thatwould correspond to a downstream gene havingspot 42 RNA as a leader transcript. There areonly two potential translational frames readingin the same direction as spot 42 RNA and havinga coding capacity of more than 30 amino acids(shown in Fig. 3C). We consider it unlikely thateither of these reading frames constitutes apotential downstream gene for the followingreasons. The coding regions are a considerabledistance beyond the spot 42 terminator, theclosest being 300 bp away, and the encodedpolypeptides are quite short (89 and 55 aminoacids). Their amino acid composition is close tothat expected from random DNA sequence, withthe abundance of each amino acid roughly pro-portional to the number of codons assigned to itin the genetic code. Moreover, none of the usualcodon preferences found in E. coli (13, 14) areobserved in these reading frames.

Further examination of the DNA sequencerevealed a much more plausible translationalreading frame on the opposite strand, extendingfrom the ATG at position 3,764 to the TAA at3,170 (see Fig. 1 and 3). Translation of thisregion would give a polypeptide of 198 aminoacids (22 kd). Not only is the amino acid compo-sition of this polypeptide much closer to that ofan average E. coli protein (7), but also the choiceof codons for each amino acid follows the biasesgenerally observed in E. coli genes, which canbe rationalized in terms of preferential use of theoptimum tRNA isoacceptor (13, 14). Examplesof such biased codon distribution in this readingframe include the preference for the CUG codonfor leucine (15 of 24), AUY for isoleucine (6 of6), CGY for arginine (11 of 12), GGY for glycine(10 of 13), GAA for glutamic acid (11 of 14), and

VOL. 152, 1982

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

1218 JOYCE AND GRINDLEY

-70 -60 -50 -40 -30 -20 - 10 +1

rrnD1 AAACAACAAACAGAAAAAAAGATCAAAAAAATACTTGTGCAAAAAATTGGGATCCCTATAATGCGCCTCCG

spot 42 ATGCAACAACTGAAAAAAAATTACAAAAAGTGCTTTCTGAACTGAACAAAAAAGAGTAAAGTTAGTCGCGTACGTTGTTGACTTTTTTTTAATGTTTTTCACGAAAGACTTGACTTGTTTTTTCTCATTTCAATCAGCGC4-.

-60 -50 -40 -30 -20 -10 +1

FIG. 6. Sequence comparison of the spot 42 promoter region (nucleotides 2,865 to 2,934 in Fig. 1) with thefirst promoter of the rrnD operon (8, 39). The two sequences are written so that their Pribnow boxes are lined up.The top strand only is shown for the rrnD1 promoter, and bases are underlined to indicate homology with thecorresponding strand of the spot 42 promoter. The features indicated by arrows on the spot 42 sequence arenoted in the text.

AAA for lysine (11 of 15). As described inResults, we have two lines of evidence thatsuggest that this reading frame is expressed.First, we have observed a polypeptide of theexpected size in maxicell protein synthesis and,by using a deleted plasmid, have assigned thisproduct to the 22-kd region. Second, we haveidentified a functional promoter sequence pre-

ceding the start of this reading frame.Possible function of spot 42 RNA. Our failure to

locate a downstream gene reading in the samedirection as spot 42 is reinforced by the detec-tion of a functional coding region reading in theopposite direction and provides a convincingargument against spot 42 being a leader tran-script involved in regulation by attenuation.What, then, is the function of spot 42 RNA? Wethink it is a reasonable possibility that the shortreading frame contained within the transcript isin fact translated to give the 15-residue peptidewhose sequence is shown in Fig. 1. The pres-ence of an open reading frame, an initiationcodon, and a correctly positioned ribosomebinding sequence (30) within a mere 109-nucleo-tide transcript seems unlikely to be just coinci-dental. The observed stable association of spot42 RNA with ribosomes (26) may be a conse-quence of its translation. Since spot 42 RNA ismore abundant (150 to 200 copies per cell) thanmost E. coli mRNAs, a translated product wouldprobably also be abundant. From the sequence,the peptide would be extremely hydrophobicand therefore more likely to be found associatedwith the membrane than in the cytoplasm. Inconsidering the possible association betweenthis peptide and the membrane, it is interestingto note that the two polar residues (Asp and His)are positioned such that they might be able toadopt a conformation so as to neutralize eachother's charge. Moreover, the peptide itselfwould be just long enough to span the membraneif it adopted the extended 310 helical conforma-tion (9). It is intriguing to note that negativecontrol by cyclic AMP (as shown for spot 42

RNA production) has also been demonstratedfor major outer membrane protein III of E. coli(18) and for cell division functions in some E.coli strains (36). If the spot 42 gene product doesindeed have a membrane location, these obser-vations may suggest a role for cyclic AMP in thecontrol of membrane biosynthesis.At present we do not know whether the spot

42 transcript or the 22-kd coding frame corre-sponds to known genetic loci. In the most recentE. coli genetic map (1), the polA gene is flankedby the ginA and rrnA genes. Rice and Dahlberg(23) have located these genes at either end of a20-kb DNA fragment that has polA approximate-ly at its center. However, since most of thegenes between rrnA and metE have only beenapproximately mapped, it is possible that somemay lie closer to polA than was supposed.

Intercistronic regions. A remarkable feature ofthe DNA sequence downstream of the polA geneis the shortness of the intercistronic spacerregions. There are only 150 bp between the endof the polA coding sequence and the start of thespot 42 RNA sequence and 125 bp between theend of the spot 42 transcript and the terminationcodon of the 22-kd reading frame. Since thesespacer regions must also contain sequences con-trolling initiation and termination of transcrip-tion of the relevant operons, there is probablyvery little redundant DNA in this 1-kb segmentof the E. coli genome. Other DNA sequencingstudies suggest that tight clustering of apparent-ly unrelated genes may be a fairly commonfeature of E. coli chromosome organization.Grundstrom and Jaurin (11) have recently dem-onstrated actual overlap between the ampC andfrd operons, and several groups have found openreading frames close to other (probably indepen-dently transcribed) genes (2, 4, 6, 16, 31, 32).

ACKNOWLEDGMENTS

We are grateful to Cynthia Flood for expert technicalassistance, to Terry Platt for teaching us the techniques of invitro transcription and RNA fingerprinting, to Aziz Sancar forteaching us the maxicell method, and to Phil Rice and Jim

J. BACTERIOL.

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

GENES IN THE polA REGION OF E. COLI 1219

Dahlberg for communicating their results before publication.This work was supported by Public Health Service grant

GM-28550 from the National Institutes of Health.

LITERATURE CITED

1. Bachmann, B. J., and K. B. Low. 1980. Linkage map ofEscherichia coli K-12, edition 6. Microbiol. Rev. 44:1-56.

2. Beck, E., and E. Bremer. 1980. Nucleotide sequence ofthe gene ompA coding the outer membrane protein II* ofEscherichia coli K-12. Nucleic Acids Res. 8:3011-3024.

3. Birnboim, H. C., and J. Doly. 1979. A rapid alkalineextraction procedure for screening recombinant plasmidDNA. Nucleic Acids Res. 7:1513-1523.

4. Brosius, J., T. J. Dull, D. D. Sleeter, and H. F. Noller.1981. Gene organization and primary structure of a ribo-somal RNA operon from Escherichia coli. J. Mol. Biol.148:107-127.

5. Burgess, R. R., and J. J. Jendrisak. 1975. A procedure forthe rapid, large-scale purification of Escherichia coliDNA-dependent RNA polymerase involving polymin Pprecipitation and DNA-cellulose chromatography. Bio-chemistry 14:4634-4638.

6. Clement, J. M., and M. Hofnung. 1981. Gene sequence ofthe X receptor, an outer membrane protein of E. coli K12.Cell 27:507-514.

7. Dayhoff, M. O., L. T. Hunt, and S. Hurst-Calderone. 1978.Composition of proteins, p. 363-369. In M. 0. Dayhoff(ed.), Atlas of protein sequence and structure, vol. 5,suppl. 3. National Biomedical Research Foundation,Washington, D.C.

8. de Boer, H. A., S. F. Gilbert, and M. Nomura. 1979. DNAsequences of promoter regions for rRNA operons rrnEand rrnA in E. coli. Cell 17:201-209.

9. Engelman, D. M., and T. A. Steitz. 1981. The spontaneousinsertion of proteins into and across membranes: thehelical hairpin hypothesis. Cell 23:411-422.

10. Grindley, N. D. F., and C. M. Joyce. 1980. Genetic andDNA sequence analysis of the kanamycin resistancetransposon Tn9O3. Proc. Natl. Acad. Sci. U.S.A.77:7176-7180.

11. Grundstrom, T., and B. Jaurin. 1982. Overlap betweenampC and frd operons on the Escherichia coli chromo-some. Proc. Natl. Acad. Sci. U.S.A. 79:1111-1115.

12. Guyer, M. S. 1978. The yb sequence of F is an insertionsequence. J. Mol. Biol. 126:347-365.

13. Ikemura, T. 1981. Correlation between the abundance ofEscherichia coli transfer RNA's and the occurrence of therespective codons in its protein genes. J. Mol. Biol. 146:1-21.

14. Ikemura, T. 1981. Correlation between the abundance ofEscherichia coli transfer RNA's and the occurrence of therespective codons in its protein genes: a proposal for asynonymous codon choice that is optimal for the E. colitranslational system. J. Mol. Biol. 151:389-409.

15. Joyce, C. M., W. S. Kelley, and N. D. F. Grindley. 1982.Nucleotide sequence of the E. coli polA gene and primarystructure of DNA polymerase I. J. Biol. Chem. 257:1958-1964.

16. Kikuchi, Y., K. Yoda, M. Yamasaki, and G. Tamura.1981. The nucleotide sequence of the promoter and theamino-terminal region of alkaline phosphatase structuralgene (phoA) of Escherichia coli. Nucleic Acids Res.9:5671-5678.

17. Laemmli, U. 1970. Cleavage of structural proteins duringthe assembly of the head of bacteriophage T4. Nature(London) 227:680-685.

18. Mallick, U., and P. Herrlich. 1979. Regulation of synthesisof a major outer membrane protein: cyclic AMP repressesEscherichia coli protein III synthesis. Proc. NatI. Acad.Sci. U.S.A. 76:5520-5523.

19. Maxam, A., and W. Gilbert. 1980. Sequencing end-labeled

DNA with base-specific chemical cleavages. MethodsEnzymol. 65:499-560.

20. McKenney, K., H. Shimatake, D. Court, U. Schmeissner,C. Brady, and M. Rosenberg. 1981. A system to studypromoter and terminator signals recognized by Escherich-ia coli RNA polymerase, p. 383-415. In J. G. Chirikjianand S. Takis (ed.), Gene amplification and analysis, vol. 2.Elsevier/North-Holland, Amsterdam.

21. Platt, T. 1981. Termination of transcription and its regula-tion in the tryptophan operon of E. coli. Cell 24:10-23.

22. Reed, R. R., R. A. Young, J. A. Steitz, N. D. F. Grindley,and M. S. Guyer. 1979. Transposition of the Escherichiacoli insertion element -yb generates a five-base-pair repeat.Proc. Natl. Acad. Sci. U.S.A. 76:4882-4886.

23. Rice, P. W., and J. E. Dahlberg. 1982. A gene betweenpolA and glnA retards growth of Escherichia coli whenpresent in multiple copies: physiological effects of thegene for spot 42 RNA. J. Bacteriol. 152:11%-1210.

24.. Rosenberg, M., and D. Court. 1979. Regulatory sequencesinvolved in the promotion and termination of RNA tran-scription. Annu. Rev. Genet. 13:319-353.

25. Sahagan, B. G., and J. E. Dahlberg. 1979. A small,unstable RNA molecule of Escherichia coli: spot 42 RNA.I. Nucleotide sequence analysis. J. Mol. Biol. 131:573-592.

26. Sahagan, B. G., and J. E. Dahlberg. 1979. A small,unstable RNA molecule of Escherichia coli: spot 42 RNA.II. Accumulation and distribution. J. Mol. Biol. 131:593-605.

27. Sancar, A., A. M. Hack, and W. D. Rupp. 1979. Simplemethod for identification of plasmid-coded proteins. J.Bacteriol. 137:692-693.

28. Sancar, A., and W. D. Rupp. 1979. Cloning of uvrA, lexCand ssb genes of Escherichia coli. Biochem. Biophys.Res. Commun. 90:123-129.

29. Sancar, A., R. P. Wharton, S. Seltzer, B. M. Kacinski,N. D. Clarke, and W. D. Rupp. 1981. Identification of theuvrA gene product. J. Mol. Biol. 148:45-62.

30. Shine, J., and L. Dalgarno. 1974. The 3'-terminal se-quence of Escherichia coli 16S ribosomal RNA: comple-mentarity to nonsense triplets and ribosome binding sites.Proc. Natl. Acad. Sci. U.S.A. 71:1342-1346.

31. Smith, D. R., and J. M. Calvo. 1980. Nucleotide sequenceof the E. coli gene coding for dihydrofolate reductase.Nucleic Acids Res. 8:2255-2274.

32. Squires, C., A. Krainer, G. Barry, W.-F. Shen, and C. L.Squires. 1981. Nucleotide sequence at the end of the genefor the RNA polymerase P' subunit (rpoC). Nucleic AcidsRes. 9:6827-6840.

33. Squires, C., F. Lee, K. Bertrand, C. L. Squires, M. J.Bronson, and C. Yanofsky. 1976. Nucleotide sequence ofthe 5' end of tryptophan messenger RNA of Escherichiacoli. J. Mol. Biol. 103:351-381.

34. Stefano, J. E., and J. D. Gralla. 1982. Spacer mutations inthe lac P' promoter. Proc. Natl. Acad. Sci. U.S.A.79:1069-1072.

35. Travers, A. A. 1980. Promoter sequence for stringentcontrol of bacterial ribonucleic acid synthesis. J. Bacter-iol. 141:973-976.

36. Utsumi, R., H. Tanabe, Y. Nakamoto, M. Kawamukai, H.Sakai, M. Himeno, T. Komano, and Y. Hirota. 1981.Inhibitory effect of adenosine 3',5'-phosphate on celldivision of Escherichia coli K-12 mutant derivatives. J.Bacteriol. 147:1105-1109.

37. Wu, A. M., G. E. Christie, and T. Platt. 1981. Tandemtermination sites in the tryptophan operon of Escherichiacoli. Proc. Natl. Acad. Sci. U.S.A. 78:2913-2917.

38. Yanofsky, C. 1981. Attenuation in the control of expres-sion of bacterial operons. Nature (London) 289:751-758.

39. Young, R. A., and J. A. Steitz. 1979. Tandem promotersdirect E. coli ribosomal RNA synthesis. Cell 17:225-234.

VOL. 152, 1982

on May 10, 2020 by guest

http://jb.asm.org/

Dow

nloaded from