Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic...

25
Genomic Sequences of Bacteriophages HK97 and HK022: Pervasive Genetic Mosaicism in the Lambdoid Bacteriophages Robert J. Juhala, Michael E. Ford, Robert L. Duda, Anthony Youlton Graham F. Hatfull and Roger W. Hendrix* Pittsburgh Bacteriophage Institute and Department of Biological Sciences, University of Pittsburgh, Pittsburgh PA 15260, USA We report the complete genome DNA sequences of HK97 (39,732 bp) and HK022 (40,751 bp), double-stranded DNA bacteriophages of Escheri- chia coli and members of the lambdoid or l-like group of phages. We provide a comparative analysis of these sequences with each other and with two previously determined lambdoid family genome sequences, those of E. coli phage l and Salmonella typhimurium phage P22. The comparisons confirm that these phages are genetic mosaics, with mosaic segments separated by sharp transitions in the sequence. The mosaicism provides clear evidence that horizontal exchange of genetic material is a major component of evolution for these viruses. The data suggest a model for evolution in which diversity is generated by a com- bination of illegitimate and homologous recombination and mutational drift, and selection for function produces a population in which most of the surviving mosaic boundaries are located at gene boundaries or, in some cases, at protein domain boundaries within genes. Comparisons of these genomes highlight a number of differences that allow plausible inferences of specific evolutionary scenarios for some parts of the genome. The comparative analysis also allows some inferences about function of genes or other genetic elements. We give examples for the generalized recombination genes of HK97, HK022 and P22, and for a putative head- tail adaptor protein of HK97 and HK022. We also use the comparative approach to identify a new class of genetic elements, the morons, which consist of a protein-coding region flanked by a putative s 70 promoter and a putative factor-independent transcription terminator, all located between two genes that may be adjacent in a different phage. We argue that morons are autonomous genetic modules that are expressed from the repressed prophage. Sequence composition of the morons implies that they have entered the phages’ genomes by horizontal transfer in relatively recent evolutionary time. # 2000 Academic Press Keywords: bacteriophage evolution; comparative genomics; genetic mosaic; horizontal exchange; lambdoid phages *Corresponding author Introduction The double-stranded DNA (dsDNA) tailed bac- teriophages are very likely the most numerous group of life forms in the biosphere (Bergh et al., 1989; Wommack & Colwell, 2000), and they may be nearly as ancient as their bacterial and archael hosts. This group of viruses constitutes a valuable resource for investigating such issues as the genetic structure and mechanisms of evolution in a large population. The availability of methods for facile Present addresses: R. J. Juhala, Department of Internal Medicine, Allegheny General Hospital, Pittsburgh, PA, USA; M. E. Ford, Division of Gastroenterology and Hepatology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA; A. Youlton, Valspar Corporation, Pittsburgh, PA 15233, USA. E-mail address of the corresponding author: [email protected] doi:10.1006/jmbi.2000.3729 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 299, 27–51 0022-2836/00/010027–25 $35.00/0 # 2000 Academic Press

Transcript of Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic...

Page 1: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

doi:10.1006/jmbi.2000.3729 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 299, 27±51

Genomic Sequences of Bacteriophages HK97 andHK022: Pervasive Genetic Mosaicism in theLambdoid Bacteriophages

Robert J. Juhala, Michael E. Ford, Robert L. Duda, Anthony YoultonGraham F. Hatfull and Roger W. Hendrix*

Pittsburgh BacteriophageInstitute and Department ofBiological Sciences, Universityof Pittsburgh, PittsburghPA 15260, USA

Present addresses: R. J. Juhala, DMedicine, Allegheny General HospiUSA; M. E. Ford, Division of GastroHepatology, School of Medicine, UnPittsburgh, Pittsburgh, PA 15261, UValspar Corporation, Pittsburgh, PA

E-mail address of the [email protected]

0022-2836/00/010027±25 $35.00/0

We report the complete genome DNA sequences of HK97 (39,732 bp)and HK022 (40,751 bp), double-stranded DNA bacteriophages of Escheri-chia coli and members of the lambdoid or l-like group of phages. Weprovide a comparative analysis of these sequences with each other andwith two previously determined lambdoid family genome sequences,those of E. coli phage l and Salmonella typhimurium phage P22.

The comparisons con®rm that these phages are genetic mosaics, withmosaic segments separated by sharp transitions in the sequence. Themosaicism provides clear evidence that horizontal exchange of geneticmaterial is a major component of evolution for these viruses. The datasuggest a model for evolution in which diversity is generated by a com-bination of illegitimate and homologous recombination and mutationaldrift, and selection for function produces a population in which most ofthe surviving mosaic boundaries are located at gene boundaries or, insome cases, at protein domain boundaries within genes. Comparisonsof these genomes highlight a number of differences that allow plausibleinferences of speci®c evolutionary scenarios for some parts of thegenome.

The comparative analysis also allows some inferences about functionof genes or other genetic elements. We give examples for the generalizedrecombination genes of HK97, HK022 and P22, and for a putative head-tail adaptor protein of HK97 and HK022. We also use the comparativeapproach to identify a new class of genetic elements, the morons, whichconsist of a protein-coding region ¯anked by a putative s70 promoterand a putative factor-independent transcription terminator, all locatedbetween two genes that may be adjacent in a different phage. We arguethat morons are autonomous genetic modules that are expressed fromthe repressed prophage. Sequence composition of the morons impliesthat they have entered the phages' genomes by horizontal transfer inrelatively recent evolutionary time.

# 2000 Academic Press

Keywords: bacteriophage evolution; comparative genomics; geneticmosaic; horizontal exchange; lambdoid phages

*Corresponding author

epartment of Internaltal, Pittsburgh, PA,enterology andiversity of

SA; A. Youlton,15233, USA.

ing author:

Introduction

The double-stranded DNA (dsDNA) tailed bac-teriophages are very likely the most numerousgroup of life forms in the biosphere (Bergh et al.,1989; Wommack & Colwell, 2000), and they maybe nearly as ancient as their bacterial and archaelhosts. This group of viruses constitutes a valuableresource for investigating such issues as the geneticstructure and mechanisms of evolution in a largepopulation. The availability of methods for facile

# 2000 Academic Press

Page 2: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

28 Bacteriophage HK97 and HK022 Genomes

DNA sequence determination, together with themodest sizes of phage genomes, means that suchstudies can be carried out at the resolution of thegenome sequence.

We report here the genome sequences of twomembers of the lambdoid family of phages, Escher-ichia coli phages HK97 and HK022. In analyzingtheir sequences, we emphasize a comparisonbetween these two phages and with two otherlambdoid genomic sequences, those of E. coliphage l (Lederberg, 1951; Sanger et al., 1982) andSalmonella typhimurium phage P22 (Zinder &Lederberg, 1952). The lambdoid phages are tem-perate phages, de®ned (Campbell & Botstein, 1983)as being capable of productive genetic recombina-tion with l, the prototype of the group. l and P22are among the best studied of viruses, with allmajor aspects of their life-cycles having receiveddetailed scrutiny over nearly the past 50 years.l was identi®ed in 1951 (Lederberg, 1951) as aprophage in the K12 strain of E. coli, which hadbeen in laboratory culture since its isolation from ahuman patient in California in 1922. P22 was iso-lated in 1952 (Zinder & Lederberg, 1952) as aprophage from S. typhimurium strain LT22, whichwas isolated in either Sweden or Chile (the recordis equivocal) prior to 1948 (Lilleengen, 1948). HK97and HK022 were both isolated in about 1975 inHong Kong (Dhillon et al., 1980; Dhillon & Dhillon,1976). HK97 has been studied primarily withregard to assembly and structure of the head(Hendrix & Duda, 1998), and studies of HK022have concentrated on early functions, includingregulation of transcription, prophage integration,and immunity (Weisberg et al., 2000).

Comparisons of lambdoid phage genomerelationships and organization were ®rst carriedout about 30 years ago by the method of DNA het-eroduplex mapping (Simon et al., 1971), and anumber of additional studies have been carried outusing genetic and DNA sequencing methods.These studies indicate that the lambdoid phagesshare a common order and organization of geneticfunctions along their genomes and that theyappear to have a mosaic structure, with someregions of the genome matching in sequence andother regions not, in any particular pairwise com-parison of lambdoid phages. These data have sup-ported some rather detailed discussions of themechanisms by which these viruses evolve(Botstein, 1980; A. Campbell, 1988; A. M.Campbell, 1994; Casjens et al., 1992; Susskind &Botstein, 1978), and data from other phages, par-ticularly the virulent phages of E. coli and phagesof various Gram-positive hosts, suggest that theseprinciples apply more generally (Monod et al.,1997; Lucchini et al., 1999; Ford et al., 1998). Theavailability of enough complete genome sequencesto allow multiple pairwise comparisons of lamb-doid phage genomes at the level of DNA sequence,as we report here, allows the existing view oflambdoid phage evolution to be re®ned and ampli-®ed, and contributes to a still very incomplete pic-

ture of the population structure of the lambdoidphages. Somewhat unexpectedly, these studies alsolead to inferences about some functional aspects ofthe phages, which would have been much less evi-dent without the comparative approach taken here.

Results

Genome sequence determination

We determined the nucleotide sequences of thegenomes of bacteriophages HK97 and HK022using the dideoxy-terminator method (Sanger et al.,1977) on libraries of random clones, as described inMaterials and Methods. HK97 has a genome of39,732 bp of DNA, and HK022 has 40,751 bp ofDNA. The two sequences are identical in theregions surrounding the maturation cleavage site,and by sequencing off the ends of virion DNA, weestablished that both phages have ten base 30single-stranded extensions (30-GCGGCGGTTT...-50and 50-...CGCCGCCAAA-30) at the left and rightends of their mature DNA, respectively.

Assignments of probable genes

We identi®ed open reading frames and assignedprobable start sites for genes primarily by visualinspection of six-frame translations of the DNAsequences, supplemented with predictions of cod-ing potentials from the GeneMark program(Borodovsky & McInich, 1993) and the programsof the Staden sequence analysis package (Staden,1986). In almost all cases, assignment of probablegene start sites was unambiguous, based on thepresence of an initiation codon (AUG, except forone case of GUG in HK97 and four cases inHK022), appropriately related to a plausible Shine-Dalgarno ribosome-binding sequence and situatedtoward the beginning of an open reading framewith good coding potential. In roughly 60 % of thegenes assigned in this way, the initiation codon iswithin 15 bp of and frequently overlapping the ter-mination codon of the upstream gene. For themajority of the exceptions to this rule of close pack-ing, the space between genes accommodates aknown or suspected transcription control signal.For a small number of genes (genes 6 and 42.1 ofboth phages, genes 8 and 57 of HK022) there is noplausible Shine-Dalgarno sequence in the usualposition, and we have no evidence about the func-tionality of these genes. In the case of genes 6, apossible Shine-Dalgarno sequence does occurfarther upstream, positioned in such a way that itmight be brought into position by formation of astem-loop structure in the mRNA. (Such anarrangement has been suggested for gene 38 ofphage T4 (Gold, 1988).)

We established various degrees of con®rmationof these gene assignments by several differentmeans. Roughly half of the genes of each of theHK phages are homologous to genes in phages lor P22 (and occasionally other lambdoid phages),

Page 3: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 29

as judged by sequence similarities that range from�30 % identity in the predicted amino acidsequences up to 98 % identity in the nucleotidesequences. Most of these l and P22 genes havebeen characterized functionally, and we haveassumed that their start sites and functions carryover to the homologous genes of HK97 andHK022. This group of genes includes the cI, cII,cIII, N and Q regulatory genes (l nomenclature),DNA replication genes, generalized and site-speci®c recombination genes, lysis genes, some ofthe genes located to the right of the DNA replica-tion genes, tail ®ber genes, and half of the tailgenes. In earlier studies (Duda et al., 1995a,b), wecharacterized the products of three HK97 headproteins, including determining their amino-term-inal amino acid sequences, and these data con®rmthe gene assignments. The head gene region of theHK97 sequence is nearly identical with the corre-sponding part of the HK022 sequence, and patternsof head proteins on an SDS/polyacrylamide gelare indistinguishable for the two phages (R.W.H.,unpublished results), so we have carried theseassignments over to HK022. We have determinedthe amino-terminal sequence of the major tail sub-unit for HK022 (data not shown), and these dataidentify the corresponding gene and con®rm itsstart site. Many of the early genes of HK022 havebeen characterized genetically and biochemically inother laboratories (summarized by Weisberg et al.,2000), and these studies have guided, and agreewith, our analysis of these HK022 genes. Asreported previously (Oberto et al., 1989), HK022carries a copy of the insertion sequence IS903.Neither phage appears to encode any tRNA.

As expected from numerous earlier studies onlambdoid phages (Campbell, 1994), the order ofgenetic functions in the HK phages, to the extentthey can be assigned, is the same as in l and othermembers of the lambdoid group of phages. Byassuming that this similarity of gene order appliesas well to genes that cannot otherwise be assigned,we have assigned tentative functions to a numberof other genes. This is particularly successful forthe head and tail genes of HK97, which have anearly one to one correspondence with the headand tail genes of l, though not always any detect-able sequence similarity. Tables 1 and 2 give com-plete lists of the genes we have identi®ed for HK97and HK022, respectively, together with our func-tional assignments and the basis for each assign-ment. In total, we have identi®ed 60 genes inHK97 and 69 in HK022. Our scheme for namingthese genes is described in Materials and Methods.

Genome sequence comparisons:genetic mosaicism

Ancestral recombination sites

Figure 1 shows the genome map of HK97,together with representations of the degree ofsequence similarity across the genome between

HK97 and HK022, l, and P22. A striking feature ofthese comparisons is that different segments of thegenomes match between two phages to differentdegrees. Thus, for example, l and HK97 matcheach other at >95 % DNA sequence identity in thearea of the cI and cro genes, but on either side ofthis region there is a sharp transition to a segmentin which the two genome sequences do not matchdetectably at all. A fundamental assumption in ourinterpretation of these results is that such tran-sitions identify sites of recombination in the ances-try of one of the two phages being compared. Notethat such a recombination event must necessarilybe of the ``non-homologous'' or ``illegitimate''variety, i.e. between two largely non-similarsequences, since an homologous recombinationevent could not give rise to a transition in degreeof similarity to a reference phage sequence. Theevident result of these recombination events is thatindividual phages of this family are geneticmosaics, drawing sequences from a shared pool.

Recombination between genes

When we examine the sites of these sequencetransitions, i.e. the putative sites of ancestralrecombination, we ®nd that the great majority ofthem are located at gene boundaries. Figure 2 illus-trates this in a comparison between HK97 and l ina portion of the region to the right of the DNAreplication genes (the nin region in l). Thesequence shown includes the two phage openreading frame (ORF) 146s (HK97 gene 61), whichare 96 % identical in nucleotide sequence and arefollowed by genes that are non-homologousbetween HK97 and l. The similarity between theshared genes persists until the last base of the ter-mination codon, after which the sequences divergecompletely. Figure 2 shows a comparison betweenHK97 and P22 in the same region; in this case, theapparent recombination point is several codonsupstream from the end of the genes. Theseexamples are representative of what we seethroughout the genomes: the putative crossoverpoints are sometimes exactly at the gene bound-aries but can occur a short distance away. In thecase of genes that are separated by substantialamounts of non-coding sequence, e.g. cro and cII orQ and S, our analysis indicates different crossoverpoints in different pairwise comparisons, locatedwithin the intergenic region.

The observations described above argue againsta site-speci®c recombination mechanism directedat gene boundaries as the source of the sequencetransitions. We favor an alternative model, inwhich the location of recombination events is notinitially restricted but the only phages that surviveto be analyzed by us have had recombinationevents that do not diminish the functionality ofany phage-coded protein on which natural selec-tion acts. This would presumably disfavor survivalof the results of most (illegitimate) recombinationevents that occur within protein-coding regions.

Page 4: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Table 1. HK97 genes

HK97

Genenumber

Alternativename Orientation Start

End (incl.stop

codon)Length

(residues) Similarities/function/commentsBasis for

assignment

1 R 50 535 161 Terminase small subunit [3]2 R 542 2056 504 Terminase large subunit [3]3 Portal R 2056 3330 424 Head portal [1]

Stem-loop R 3309 33354 Protease R 3348 4025 225 Head maturation protease [1]5 Head R 4028 5185 385 Major head subunit [1]

Stem-loop R 5192 52136 R 5219 5545 108 S-D across s-l? [3]7 R 5545 5883 112 Putative head-tail adaptor

(lambda FII analog)[3]

10 R 5880 6329 149 Lambda Z analog? [3]11 R 6326 6673 115 Lambda U analog? [3]

Stem-loop R 6692 671912 R 6733 7437 234 Major tail subunit [1]

Stem-loop R 7443 746213 R 7472 7858 128 Tail assembly chaperone [1, 3]14 R 7834 8145 103 Start at presumed frameshift

site[1, 3]

15 R 8203 8391 62 Moron gene [4]Stem-loop R 8398 8422

16 H R 8435 11704 1089 Tail length tape measure,lambda H homolog

[1]

17 M R 11707 12045 112 Lambda M homolog [2]18 L R 12042 12800 252 Lambda L homolog [2]19 K R 12802 13512 236 Lambda K homolog [2]20* R 13560 13784 74 Moron gene with amber in

codon 33[4]

Stem-loop R 13799 1382421 I R 13834 14442 202 Lambda I homolog [2]22 R 14475 14837 120 Moron gene [5]

Stem-loop L 14717 1473823 L 14975 14715 86 Reverse moron gene [5]

Stem-loop R 15107 1512924 J R 15141 19031 1296 Tail fiber, lambda J homolog [2]28 stf R 19239 20204 321 Tail fiber, lambda Stf homolog [2]29 tfa R 20204 20812 202 Lambda Tfa homolog [2]

att 21228 21248 att site (core) [2]30 int L 22388 21318 356 Integrase [2]31 xis L 22584 22366 72 Excisionase [2]

Stem-loop L 22612 2263137 L 23302 22772 176 [4]38 L 23463 23299 54 [4]39 abc2 L 23767 23474 97 P22 abc2 homolog [2]40 L 24175 23786 129 Possible P22 abc1 analog [3]41 erf L 24780 24175 201 P22 erf homolog [2]42 L 24967 24791 58 [5]42.1 L 25042 24791 83 Questionable gene [5]43 kil L 25191 25039 50 [3]44 cIII L 25310 25176 44 [2]

Stem-loop L 25400 2537745 L 25975 25505 156 [5]47 N L 26417 26034 127 Closest to P22 ``N'', gp24 [2]48 L 27559 26909 216 [5]49 L 27813 27547 88 [5]50 cI L 28952 28239 237 Same immunity as lambda [2]51 cro R 29053 29253 66 [2]52 cII R 29391 29687 98 [2]53 R 29720 29881 53 [5]54 O R 29868 30689 273 [2]55 P R 30686 32062 458 dnaB homolog [2]56 R 32059 32157 32 [4]60 R 32144 32356 70 [5]61 orf R 32337 32777 146 Recombination function [2]62 R 32774 33301 175 [5]63 R 33298 33480 60 [4]64 R 33477 33647 56 [4]65 roi R 33640 34365 241 DNA-binding protein [2]66 R 34365 34655 96 [4]67 R 34652 35014 120 RusA homolog [2]68 R 35011 35199 62 [4]

30 Bacteriophage HK97 and HK022 Genomes

Page 5: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

HK97

Genenumber

Alternativename Orientation Start

End (incl.stop

codon)Length

(residues) Similarities/function/commentsBasis for

assignment

69 Q R 35196 35819 207 Late transcription regulator [2]70 S R 36248 36568 106 Holin, dual start motif [2]71 R R 36552 37028 158 Lysis, transglycosylase family [2]72 Rz R 37025 37462 145 Lysis [2]73 R 37672 38400 242 [5]74 R 39054 39338 94 [5]

End of seq: 39732

L and R indicate transcription in the leftward and rightward directions, respectively, in reference to the standard orientation ofthe map.

The basis of functional assignments given in the Table is indicated as follows: [1] experimental work on this gene in this phage;[2] sequence similarity to a gene of known function in a different phage; [3] no sequence similarity, but the position in the geneorder is analogous to that of a gene of known function in another phage; [4] sequence similarity to a gene of unknown function; [5]no sequence match or other basis for assignment.

Table 1. (continued).

Bacteriophage HK97 and HK022 Genomes 31

Support for this point of view comes from examin-ing the few examples our comparisons do provideof such recombination events within codingregions.

Recombination within coding regions

Both HK97 and HK022 have generalized recom-bination genes that match those of P22. Figure 3shows a comparison between the recombinationregions of HK97 and P22. The two phages havesimilar sequences (92 % amino acid identity) forthe abc2 gene, which encodes a recombination-modulating function that interacts with the P22 Erfrecombination protein and with cellular recombi-nation functions (Murphy et al., 1987b; Poteete &Volkert, 1988). In the position of P22 gene abc1,which encodes another recombination-modulatingprotein, there is no matching sequence in HK97,but there is a sequence-dissimilar gene in that pos-ition. In the third P22-encoded recombination gene,erf, there is evidence for a recombination eventwithin the coding region. The resulting sequencetransition occurs at approximately codon 150 inthis 200 codon gene, with the result that theamino-terminal three-quarters of the two Erf pro-teins are predicted to be 89 % identical in aminoacid sequence and the carboxy-terminal quartersare predicted to be 31 % identical. Previous studies(Murphy et al., 1987a; Poteete et al., 1983) showedthat Erf is a two-domain protein and that the twodomains can be visualized separately by electronmicroscopy, separated by protease treatment, andassigned different functions. The domain boundaryidenti®ed in the earlier studies corresponds withthe location of the putative recombination site wesee here, and we suggest that the hybrid proteinproduced as a result of the recombination eventallowed the phage carrying its gene to survivebecause the recombination occurred at a location,the domain boundary, that had minimal effect onthe ability of the protein to carry out its functions.

(Note also that the sequence relationships shownin Figure 3 suggest an hypothesis about how thethree recombination proteins work together basedon which sequences have segregated togetherthrough evolution; namely, that the large amino-terminal domain of Erf may interact with Abc2while (more tentatively) the carboxy-terminaldomain of Erf may interact with Abc1.)

A second example is derived from comparisonof the l and HK022 integrases. From the beginningof the integrase genes to codon 55, the two pre-dicted proteins are identical in amino acidsequence (corresponding to 97 % identity in nucleo-tide sequence), and following that point they areonly 69 % identical in amino acid sequence. As inthe case of Erf, the transition point corresponds tothe boundary between two domains of the protein.This observation has been reported and discussedpreviously (Yagil et al., 1989).

A somewhat different example of recombinationwithin a coding sequence comes from the headportal and protease genes, genes 3 and 4, of HK97and HK022 (corresponding to l genes B and C).Over the surrounding portions of the head generegion these two phages are nearly identical(�99.8 % nucleotide identity), and this makes itpossible to detect a �700 bp stretch of sequence forwhich the match is lower but still very high, at89 % identity. The section of reduced similarityspans from codon �326 of gene 3 to codon �128 ofgene 4. It appears that an ancestor of one of thesephages enjoyed a recombination event withanother phage carrying very similar but non-iden-tical portal and protease genes, with the result thata segment of the second pair of these genes wassubstituted for the resident segment. The 76nucleotide differences in this region give rise to sixconservative amino acid substitutions, three Ser toAla, one Ala to Ser, one Leu to Ile, and one Met toLeu. The two slightly different versions of the por-tal and protease genes that resulted from this pro-cess, i.e. the HK97 and HK022 versions, are

Page 6: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Table 2. HK022 genes

HK022

Genenumber

Alternativename Orientation Start

End (incl.stop

codon)Length

(residues)Similarities/function/

commentsBasis for

assignment

1 R 50 535 161 Small terminase subunit [3]2 R 542 2056 504 Large terminase subunit [3]3 Portal R 2056 3330 424 Portal protein [2]

Stem-loop R 3308 33354 Protease R 3348 4025 225 Head maturation protease [2]5 Maj. head R 4028 5185 385 Major head subunit [2]

Stem-loop R 5193 52136 R 5219 5545 108 Lambda FI analog?; S-D

across s-l?[3]

7 R 5545 5781 78 Prob. head-tail adaptor(lambda FII analog)

[3]

8 R 5778 5975 65 Unlikely S-D [5]9 R 5977 6309 110 [5]10 R 6302 6841 179 [5]11 R 6838 7203 121 [5]

Stem-loop R 7219 724112 R 7258 7758 166 Major tail subunit [1]

Stem-loop R 7763 778213 R 7797 8282 161 Tail assembly chaperone [1, 3]14 R 8129 8459 Start at presumed

frameshift site[1, 3]

16 H R 8478 10898 806 Tail length tape measure;lambda H homolog

[1]

17 M R 10898 11236 112 Lambda M homolog [2]18 L R 11233 11988 251 Lambda L homolog [2]19 K R 11990 12700 236 Lambda K homolog [2]20 R 12748 12972 74 Moron gene [4]

Stem-loop R 12987 1301021 I R 13022 13630 202 Lambda I homolog [2]22 R 13652 13831 59 Moron gene [5]

Stem-loop L 13723 1374623 srb L 13751 13972 73 Moron gene [1]

Stem-loop R 14102 1412524 J R 14137 17688 1183 Tail fiber, lambda J

homolog[2]

25 R 17690 17992 100 [4]26 R 17992 18630 212 [4]27 cor R 18738 18971 77 Moron gene; phage

exclusion[1]

Stem-loop R 18991 1901328 stf R 19030 20130 366 Tail fiber, lambda Stf

homolog[2]

att 20504 20730 attP [1]30 int L 21797 20724 357 Integrase [1]31 xis L 21993 21775 72 Excisionase [1]

Stem-loop L 22018 2204132 L 22442 22098 114 N15 early protein

homolog; deleted inlambda

[4]

Stem-loop L 22476 2245733 L 22648 22469 59 Undoc. homolog present

in lambda[4]

Stem-loop L 22705 2267934 L 22992 22708 94 Homolog in 21 [4]35 L 23350 22985 121 Homolog of P22 EaA C-

term half[4]

36 L 23712 23347 121 [5]37 L 24421 23714 235 Spotty similarity to

lambda Ea22[4]

38 L 24582 24418 54 [4]39 abc2 L 24886 24593 97 P22 abc2 homolog [2]40 L 25293 24910 127 Likely analog of P22 Abc1 [3]41 erf L 25898 25293 201 P22 erf homolog [2]42 L 26079 25909 56 [5]42.1 L 26158 25988 56 Questionable gene [5]43 kil L 26307 26155 50 Kil [3]44 cIII L 26426 26292 44 CIII [2]

Stem-loop L 26516 2649346 Transposase L 27580 26657 307 IS903 [2]47 nun L 28057 27725 110 Nun [1]

32 Bacteriophage HK97 and HK022 Genomes

Page 7: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Table 2. (continued)

HK022

Genenumber

Alternativename Orientation Start

End (incl.stop

codon)Length

(residues)Similarities/function/

commentsBasis for

assignment

50 cI L 29129 28422 235 CI [1]51 cro R 29209 29439 76 Cro [1]52 cII R 29561 29857 98 CII [1]53 R 29890 30036 48 [5]54 O R 30029 30928 299 Sim. to SPP1 DNA

replication protein[2]

55 P R 30918 32354 478 P22 12, E. coli dnaBhomolog; DNA replication

[2]

56 R 32354 32458 34 [4]57 R 32436 33047 203 Unlikely S-D [5]

Stem-loop R 33047 3307958 R 33095 33301 68 [5]59 R 33319 33636 105 [5]60 R 33641 33895 84 [5]61 orf R 33876 34316 146 Recombimation function [2]63 R 34313 34495 60 [4]64 R 34492 34662 56 [4]65 roi R 34655 35380 241 DNA-binding protein [1]66 R 35380 35670 96 [4]67 R 35667 36029 120 RusA homolog [2]68 R 36026 36214 62 [4]69 Q R 36211 36834 207 Lambda Q homolog; late

regulation[1]

70 Holin R 37267 37590 107 Holin, dual start motif [2]71 Lysin R 37574 38050 158 Lysis, transglycosylase

family[2]

72 Rz R 38047 38484 145 Lysis [2]73 R 38691 39419 242 [5]74 R 40072 40413 113 [5]

End of seq: 40751

L and R indicate transcription in the leftward and rightward directions, respectively, in reference to the standard orientation ofthe map.

The basis for functional assignments is as indicated in Table 1.

Bacteriophage HK97 and HK022 Genomes 33

evidently both functional. We suggest that the sub-stitution was successful, not because the substi-tuted region de®nes an independent functionalunit as in the cases cited above, but simply becausethe substitutions that resulted were functionallyneutral. As in the cases above, this example is com-patible with the hypothesis that recombinationevents can occur virtually anywhere in the genomebut that the resulting recombinants survive only ifbiological function is not disrupted.

It was ®rst observed in the comparisons of lamb-doid phage genomes by heteroduplex mapping(Simon et al., 1971) that sequence transitions of thesort we are discussing are much more sparse in thehead and tail genes of these phages than in theearly genes. Our comparisons corroborate thisobservation (see Figure 1). The explanation gener-ally given for this lack of recombinants, withwhich we agree, is that the head and tail genesencode groups of proteins that have co-evolved tointeract intimately in the structures they build, andthat mixing head genes (for example) from twodifferent phages is likely to produce a non-func-tional group of head proteins (each member ofwhich would be functional in its native context).The example cited in the previous paragraph

argues that recombination between differentphages can occur and persist in the head genes if itdoes not disrupt function. We do see several, moresubstantial sequence transitions in our comparisonsof the head and tail regions of l, HK97 andHK022. These are described and discussed below.(There is no detectable sequence similarity betweenP22 and any of the other three phages in the headand tail regions.)

Evolutionary relationships and phage biology

In addition to illuminating the general patternsin which putative recombination sites are found, asdescribed above, comparisons of the sequences ofthese phages yield a number of independentexamples in which differences between genomesare apparently informative about some aspect ofphage function or phage evolution. We presentthese below, roughly in the order of their appear-ance on the genomes.

Possible portable expression units: morons

The head and tail gene regions of l and HK97share weak sequence similarity over slightly less

Page 8: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 1. HK97 map, showing sequence similarities to other phages. The kilobase scale represents the HK97 genome, and the boxes show gene positions. Shaded genesare transcribed leftward and open genes are transcribed rightward. The colored histograms show the degree and locations of sequence similarity between HK97 and theindicated phages. The locations of the matches are re¯ected above the genes. Numbers below the histograms preceded by the symbol ~ or � indicate positions of ade®ciency or surplus, respectively, of the indicated number of base-pairs in HK97 relative to the comparison sequence.

Page 9: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 2. Recombination joints at the level of nucleotide sequence. The sequence alignments shown extend fromwithin the coding regions of the ORFs 146 of the indicated phages and into the following coding regions. ORF 146 inHK97 is gene 61.

Bacteriophage HK97 and HK022 Genomes 35

than half of their length. Through this region, andin the region with no detectable similarity betweenthe phages, there is a striking similarity in theapparent organization of genes (Figure 4(a)). Thus,for every head or tail gene in l, there is a corre-sponding HK97 gene of similar size and, at least inthe cases where information is available, analogousfunction. The only signi®cant exception to this(beyond some remodeling of gene arrangementsaround the capsid subunit genes; Duda et al.,1995a) is that HK97 has three segments of DNAthat are absent from l. These may be regarded for-mally as DNA segments that were inserted into al-like ancestor to produce HK97, or as segmentsthat were deleted from a HK97-like ancestor toproduce l. Figure 4(b) shows the sequence of oneof these regions from the gene 14-15-16 region ofHK97. Gene 14 corresponds to l gene T, and gene16 corresponds to l gene H. The termination codonof gene T overlaps the initiation of gene H in l, sothe ``inserted'' DNA in HK97 is roughly the 289 bpbetween genes 14 and 16. There is one plausiblegene sequence within this region (gene 15), thathas a good Shine-Dalgarno sequence and is pre-dicted to encode a protein of 62 amino acid resi-dues (see Figure 4(b)). Between gene 15 and thetwo adjacent genes there are uncharacteristicallylarge non-coding spaces: 57 bp between genes 14and 15 and 43 bp between genes 15 and 16. How-ever, we ®nd sequences in these spaces that weinterpret provisionally as transcription control sig-nals. Thus starting at the sixth base-pair after thetermination codon of gene 14 is a plausible ÿ35region of a s70 promoter, followed by a canonical

17 bp spacing and a plausible ÿ10 region. Thetranscription start site of this promoter would be6 bp before the Shine-Dalgarno sequence of gene15. On the downstream side of gene 15 there is aninverted repeat in the sequence followed by eightthymine residues; the corresponding RNA wouldbe expected to make a stable stem-loop structure,and this sequence therefore has the characteristicsof a r-independent transcription terminator.

On the basis of these features of the sequence,this �300 bp ``insert'' has the appearance of beinga genetic module, with a transcription promoterand terminator ¯anking a protein coding gene.Given these properties and its location and orien-tation, we expect that gene 15 would be expressedfrom a repressed HK97 prophage, driven from itsassociated promoter, as well as during lytic growthof the phage, both from its own promoter andfrom the phage late promoter. We have no directevidence to indicate whether these putative tran-scription control elements actually function assuch. However, our con®dence that the sequences¯anking gene 15 are functionally meaningful isincreased by the fact that we ®nd a sequenceorganization similar to that described here in ®veother DNA segments in HK97 and HK022(described below). Each of these segments, like thegene 15-containing segment, is located betweentwo genes for which the homologous genes in oneof the other phages in this reference group areadjacent, that is, the segment can be thought of for-mally as having been inserted between two genesof an ancestral phage. To provide a simple way torefer to these DNA segments, we propose to give

Page 10: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 3. Sequence similarities inthe recombination gene region. TheFigure shows the genes responsiblefor homologous recombination inP22 and the corresponding gene ofHK97. The shaded rectangles indi-cate the extent of high levels ofsequence similarity.

36 Bacteriophage HK97 and HK022 Genomes

them the name, moron, to indicate the fact thatwhen one is present in the genome there is moreDNA than when it is not present.

HK022 has three morons (Figure 4(a)). Consider-ing ®rst the rightmost, this sequence contains ahomolog (76 % and 64 % amino acid sequence iden-tity) of the cor genes of phages f80 and N15,respectively, ¯anked by putative promoter and ter-minator elements as in the gene 15 moron. The corgene in f80 is reported to prevent superinfectionof a f80 lysogen by f80 or other phages that usethe same cell-surface receptor, probably by encod-ing a protein that goes to the outer membrane ofthe cell and interacts with the receptor (Matsumotoet al., 1985). On this evidence, cor must beexpressed from the repressed prophage, in accord-ance with our prediction about expression of mor-ons.

HK022 has a second moron, organized aroundgene 20, which lies between the homologs of l tailgenes K and I. Gene 20 is predicted to encode a 73amino acid residue protein, and it is ¯anked by aputative promoter and terminator as in the othercases. Sequence searches show that this moron hassequence similarity to two morons in HK97. First,the protein sequences encoded by the HK022 gene20 moron and the HK97 gene 15 moron are weaklybut signi®cantly related (48 % amino acid identityover 46 residues). HK022 does not have a moron(or any gene) at the position of gene 15 in HK97,so we might speculate that whatever function isprovided to HK022 by its gene 20 is provided toHK97 by its gene 15. The second sequence matchto the HK022 gene 20 moron is in the correspond-ing position in HK97, i.e. between genes 19 and 21.In this case, the sequence similarity is high (99 %nucleotide sequence identity, extending throughthe entire moron). However, the part of thesequence corresponding to gene 20 has an ambertermination codon in place of a glutamine codon atcodon 33. We speculate that HK97 may have hadan intact ``gene 20`` in the past but that it was inac-tivated by mutation. If we assume that the intactgene 20 provided some advantageous function tothe phage, and that the homologous gene 15 dupli-cated that function, then HK97 may have lost itsgene 20 subsequent to its acquisition of the gene 15

moron, either because there was no selectiveadvantage to maintaining redundant functions orpossibly because the two proteins interfered witheach other's function.

Finally, there are moron-like sequences in bothHK97 and HK022 between genes 21 and 24. Theseare more complex in structure than the moronsdescribed above. Although they are embedded in aregion of the genomes where the two phages arehighly similar in sequence, the two morons are lar-gely dissimilar in sequence though similar in struc-ture. Figure 4(c) illustrates their structures andsequence relationships. Both phages have anapparent gene starting soon after the end of theupstream gene 21; however, we do not detect anysequence that could be considered a promoter ineither case, so we expect that these genes (genes22) would be expressed under the same transcrip-tional control as for the majority of the genes in thelate operon. Overlapping the ends of these genes,both phages have a small gene in the oppositeorientation, ¯anked in moron fashion by a putativepromoter at its right (upstream) end and a putativeterminator at its left (downstream) end. In the caseof HK022, this backwards oriented gene is a pre-viously characterized gene called srb (Atkinson &Gottesman, 1992), which has a function related toinitiation of transcription from the phage's lateoperon promoter. We would expect that srb (andthe corresponding gene 23 in HK97) would beexpressed from its promoter in a repressed proph-age. This transcription would also presumablyoccur during lytic growth of the phage, but trans-lation of the srb mRNA might be shut off late inthe life-cycle due to an antisense effect of theopposing late operon transcript.

As suggested above, a simple sequence compari-son cannot distinguish between the hypothesis thatthe presence of a moron re¯ects a past DNA inser-tion event and the hypothesis that absence of amoron re¯ects a past deletion event. However,when we calculate the G � C base compositionacross the genomes of both HK phages, we ®nddips in the curves at the positions of the morons(Figure 5). For the gene 15 moron of HK97, forexample, the moron itself has 37 % G � C, but the1000 bp stretches ¯anking it on the left and right

Page 11: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 4. (a) Left arms of l, HK97 and HK022. The green lines indicate homologies based on sequence similarities.Orange lines indicate proposed analogies based on functional properties and gene order or, for the dotted orangelines, gene order alone. Proposed promoters and terminators are indicated by green and red symbols, respectively.(The lom gene, which does not have an obvious promoter, is considered below, in the Discussion.) The morons arethe colored genes with their associated promoters and terminators. All genes shown are transcribed from left to rightwith the exception of moron genes 23, colored blue. Note that in most published maps of l the stf gene is shown in amutated form as two separate open reading frames, ORF 401 and ORF 314, rather than a single one as shown here(Hendrix & Duda, 1992). (b) HK97 gene 15 moron. The Figure shows the proposed coding region of gene 15, the pro-posed promoter and terminator elements, and their relationships to the ¯anking genes. (c) Complex morons. Theregion between gene 21 (l I homolog) and 24 (l J homolog) is shown for HK97, HK022 and l. Genes above the lineare transcribed rightward and genes below the line leftward. Proposed promoters and terminators are indicated bysymbols.

Bacteriophage HK97 and HK022 Genomes 37

have 55 % and 52 % G � C, respectively. Differ-ences in G � C content of this sort have been usedin other systems to infer that the DNA segmentwith the atypical G � C content entered the gen-ome from some external source relatively recentlyin evolutionary time (Lawrence & Ochman, 1997).On this basis, we suggest that the morons are pre-sent in the HK phages as the result of insertion

into an ancestral phage that lacked them, the ®rstof the two hypotheses stated above.

Other sharp deviations from the average G � Ccontent of the left arms of these phages occur atthe attachment sites and at the ends of some of thetail ®ber genes, which are indicated in Figure 5 asopen boxes. We have no evidence regarding thebiological signi®cance of the low G � C content at

Page 12: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 5. Scans of G � C contentacross the left arms of HK97 andHK022. The guanine � cytosinecontent, as a percentage, was deter-mined with a sliding 300 bp win-dow and is plotted as a function ofgenome position. The dark rec-tangles along the bottoms of theplots show the positions and sizesof the moron genes, and the barsextending out from them show theextents of the surrounding non-coding regions. The open rectanglesrepresent the tail ®ber genes J and28; they are included here to showtheir relationships to some of thesequence regions that have lowG � C content but are not associ-ated with morons. The sequencesof the J proteins of the two phagesare 98 % identical up to the pos-itions of the small vertical lines inthe genes, and not detectablyrelated following those points.

38 Bacteriophage HK97 and HK022 Genomes

the attachment sites, but we note that the samepattern is seen in other phages we have examined,including lambdoid phages l and P22, and myco-bacteriophages L5 and D29. We discuss the tail®ber genes further below.

Other possible transcription terminators

In addition to the putative transcription termin-ators that follow the genes of morons, there areseveral other potential terminator structures atother locations in the HK97 and HK022 genomes,principally but not exclusively among the virionstructural genes. Like the terminators associatedwith morons, the other terminators are located innon-coding regions between genes, except for threewhich overlap the end of the upstream codingregion. The primary nucleotide sequences of theterminators (both those associated with moronsand those not) are not generally closely related, butthe predicted structures of the RNA stem-loops arevery similar: uninterrupted stems 5-8 bp long andrich in G-C base-pairs, four base loops (with a fewlarger loops), followed by several U bases, some-times interrupted by one or two other bases, oftenA. There are 11 of these in HK97 and 13 in HK022.In addition, of the ®ve potential stem-loop struc-tures identi®ed in the late genes of l by Sangeret al. (1982), four (following genes D, E, I and lom)®t the pattern of the structures we see in the HKphages. We ®nd three additional examples in theright arm of l, and six such structures in P22.Figure 6 shows the sequences of these 37 potentialstem-loop structures. We suggest that the strongconservation of the general features of these struc-tures, including their locations with respect to geneboundaries, argues that they are maintained under

positive selection. We consider below what theirbiochemical function might be.

Discontinuity at the head-tail junction of HK97and HK022

It was noticed in the early DNA heteroduplexcomparisons of lambdoid phage sequences (Simonet al., 1971) that a sequence transition within thevirion structural genes often occurs roughly at theboundary between the head and tail genes, and wesee another example of this in the comparisonbetween HK97 and HK022. If the explanation citedabove for why successful recombination eventsmay be rare in the structural genes is correct,namely that the proteins that make up the virionstructure have co-evolved while maintaining inti-mate protein-protein interactions with each otherin the structure, and that they therefore cannot bemixed successfully with different sets of structuralproteins, then it is unclear why recombinationevents among these genes would ever survive.Examination of the HK97/HK022 sequence tran-sition suggests an explanation. The transitionoccurs not at a gene boundary but within the cod-ing sequences of gene 7 of the two phages. Wehave assigned gene 7 as a likely homolog of l geneFII, a gene that has been studied in some detail(Casjens, 1974). The product of gene FII, gpFII, isthe last protein to join to heads, and it forms abinding site and determines binding speci®city fortails. We suggest that the amino-terminal portionsof the HK97 and HK022 gp7 proteins, which arethe same in the two phages, include the bindingsite for the heads, which are also the same in thetwo phages, and that the carboxy-terminal portionsof the proteins, which are different, include the

Page 13: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 6. Predicted stem-loop sequences of lambdoid phages. All of the predicted stem-loop terminators listed inTables 1 and 2, as well as corresponding sequences from l and P22, are shown here, oriented so that local transcrip-tion goes rightward. Gene names refer to the upstream genes. Letters highlighted in red are the termination codonsof the upstream gene (with two examples of the complement of the termination codon of a gene converging from theright); green letters are the initiation codons for the downstream gene. Pink indicates probable Shine-Dalgarnosequences. Inverted repeats are underlined and the stem and oligo(U) regions of the proposed terminators are shownin two shades of blue.

Bacteriophage HK97 and HK022 Genomes 39

binding site for the tails, which also differ betweenthe phages.

Possible conservation of a translationalframeshift site

In phage l, gene V, which encodes the tail shaftsubunit, is followed by two overlapping openreading frames, G and T, which are expressed by atranslational frameshift mechanism: gene G istranslated conventionally at a high level, while T,which is not translated independently, getsexpressed when one ribosome out of every 30translating G undergoes a ÿ1 frameshift into the Treading frame when it arrives at a ``slippery''sequence in the mRNA located seven and eightcodons from the end of the G coding region (Levinet al., 1993). The result is that a large amount ofgpG is made along with a small amount of afusion protein, gpG-T, containing nearly the entireamino acid sequence of gpG fused to the aminoacid sequence encoded in the T open readingframe.

The HK97 and HK022 sequences are not detecta-bly related to l, or to each other, in the region cor-responding to l G and T. Nevertheless, theyappear to have a gene structure similar to the l G/T arrangement in their genes 13 and 14. There ispreliminary evidence that expression by way of atranslational frameshift applies in HK97 andHK022 as in l (J. Xu, personal communication).A similar frameshifting mechanism appears to bepresent in what may be homologous tail genes inthe mycobacteriophages L5 and D29 and Strepto-myces phage fC31 (Ford et al., 1998; Hatfull &Sarkis, 1993; Smith et al., 1999). Evidently, this unu-sual mechanism of gene expression has been pre-served in what we presume to be homologous tailgenes, in the absence of any recognizable primarysequence similarity.

Tail-length tape measure protein genes

In l, the length to which the tail shaft poly-merizes is determined by a template or tapemeasure protein, gpH, and changes in the lengthof gpH caused by deletions or insertions in the

Page 14: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

40 Bacteriophage HK97 and HK022 Genomes

coding region have corresponding effects on thelength of the tail (Katsura & Hendrix, 1984;Katsura, 1987). HK022 and HK97 have tails thatare approximately 10 % shorter and 18 % longer,respectively, than l (R.W.H., unpublished results).They are therefore expected to have tape measureproteins of appropriately different lengths than lgpH, and candidate proteins for these roles havebeen seen in virions (M. Popa & R.W.H., unpub-lished results). Examination of the HK022 andHK97 genome sequences reveals that they havegenes of the appropriate sizes to encode the puta-tive tape measure proteins (genes 16). These genesare in the same position in the gene order as is lgene H and they are homologs of gene H by thecriterion that a portion of their predicted proteinsshares amino acid sequence similarity with l gpH.The region of shared sequence extends over thecarboxy-terminal regions of the proteins, indicatingthat variations in tape measure length among thesephages are located in the amino-terminal portionof the proteins.

Apparent mutations in a l tail gene

The majority of genes in the head and tail generegions of these phages are arranged with the endsof their coding regions tightly abutted to those of

Figure 7. Lambda gene K evolution. The upper part of thesponding region of HK022. The shading shows the sequencshows the sequences expanded from the indicated regionsamino acid symbols show identities between the two sequen

the adjacent genes. Of the exceptions to this rule,most are cases where extra space between codingregions is occupied by the putative transcriptionterminator structures or moron promoters dis-cussed above. Gene K of l ®ts neither of these pre-scriptions: there are 148 bp between the end of theupstream gene L and the beginning of gene K, andthe end of gene K overlaps the beginning of gene Iby 103 bp. In HK022 (and HK97, which is 96 %identical with HK022 in this region), the situationis more usual: genes 18 and 19 (the homologs of lL and K) are spaced apart by 1 bp and gene 19 isfollowed closely by a moron. The relationshipsamong these genes become clearer when we alignthem. Figure 7 shows the region around the geneL/K (18/19) junction, with a three frame translationfor the l sequence and a one frame translation ofthe HK022 sequence. It is apparent from this rep-resentation of the sequences that the beginning ofthe HK022 gp18 sequence matches the predictedtranslation of the l sequence in two different read-ing frames; this frameshift would be resolved andthe differences between the start points of thegenes therefore reconciled if l had an extra base-pair at the location indicated on the Figure. For-mally, the l arrangement could have arisen froman HK022-like ancestor by a 1 bp deletion, orHK022 could have arisen from a l-like ancestor by

diagram shows the L, K and I genes of l and the corre-e similarity relationships. The lower part of the diagram, with translations in one or three frames. The circledces.

Page 15: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 41

a one base-pair insertion. Based on the fact thatHK022 and HK97 conform to the usual juxtaposedarrangement of head and tail genes and l departsradically, we suggest that the HK022 arrangementis the ancestral state and that the l arrangement isderived. Note that while in principle the unusualstart point claimed for the l gpK protein (Sangeret al., 1982) could be an artifactual consequence ofan error of sequence determination in the lsequence, we do not believe this is the case,because in the sequence around the end of thegene L coding region there is no convincing Shine-Dalgarno sequence of the sort we would expect tosee if translation of gpK really began here (and asis found in the HK022 and HK97 sequences). Itseems likely that the Shine-Dalgarno sequence thatwe presume was present in the ancestral form of lhas been lost by mutation over the time since theproposed 1 bp deletion event.

Figure 7 compares the sequences at the K/I (19/21) junctions of the two phages. Again the twoarrangements can be related in principle to eachother by a single genetic event, either a deletion ofthe material between the points marked A and Bfrom an HK022-like ancestor, or an insertion atpoint C into a l-like ancestor. As was true for theother end of the gene, it seems more plausible thatthe atypical l arrangement is derived by deletionfrom an HK022-like ancestor. An effect of such adeletion is to extend the gpK protein by 18 aminoacid residues beyond the position corresponding tothe carboxy terminus of the HK022 gp19 protein.One way to understand the apparent fact thatthere have been unusual genetic events at bothends of gene K is to postulate that their effects onprotein function were compensatory, so that occur-rence of the ®rst created a selective pressure favor-ing the second.

We presume that the generally tightly packedarrangement of the head and tail genes of thesephages re¯ects the existence of an evolutionarytendency toward such an arrangement, eitherbecause of a selective pressure favoring thatarrangement or, perhaps more plausibly, becauseof a lack of selection against rearrangements of thesequence, such as deletions of intergenic DNA,that lead toward that arrangement. Thus we viewthe current condition of the l K gene as a snapshotrepresenting a stage in an evolutionary excursionby the gene, which started with the regulararrangement seen in HK022 and HK97, and whichwill eventually return to such an arrangement,though most likely with a net rearrangement of theprotein structure.

Tail fiber similarities

HK97 and l have the same host range, mediatedin l by an interaction between the E. coli LamBprotein in the outer membrane of the cell and thephage central tail ®ber protein gpJ. The HK97 geneJ homolog, gene 24, has weak similarity (32 %amino acid identity) to l J that extends through the

®rst 66 % of the HK97 gene. A comparable level ofsimilarity extends upstream from gene J throughgenes I, K, L, M, and part of H, genes whose pro-teins probably interact with gpJ (Katsura & KuÈ hl,1975). The dissimilarity of the C-terminal parts ofthe HK97 and l J proteins is perhaps surprising,since this region of the l J protein has been shownto interact with the LamB receptor that HK97 andl share (Dhillon et al., 1980; Werts et al., 1994). Thismay mean that the interactions between the N-terminal parts of gpJ and the other tail tip proteinsplace more stringent constraints on sequence driftthan does the interaction between the C-terminalpart of gpJ and LamB. Alternatively, it is possiblethat the end of the J genes is an exchangeable mod-ule, as has been seen in a different family of tail®ber genes (Haggard-Ljungquist et al., 1992) corre-sponding to the stf and tfa genes of l. Such aninterpretation is compatible with the G � C scan ofthis region of the genome shown in Figure 5, inthat there is an atypical G � C content at the endof the HK97 tail ®ber gene J, suggestive of recenthorizontal exchange of sequences at the end of thisgene. The relationship between the HK022 J homo-log (gene 24) and l J is the same as between HK97and l, 32 % amino acid identity through the same®rst segment of the genes (corresponding to 73 %of the HK022 gene) and no similarity thereafter,and in this case the two phages use different cellu-lar receptors. Between HK97 and HK022 the twogpJ homologs are 98 % identical through the ®rst1145 amino acid residues of the proteins and unre-lated for the remaining 151 residues of the HK97protein and 38 residues of the HK022 protein. Thisrelationship argues for recent horizontal exchangeinto the sequences at the ends of at least one ofthese J genes, and this reinforces the inferencesdrawn above based on G � C content.

Analysis of the tail ®ber genes of a broad rangeof E. coli phages showed that there are often twogenes encoding the tail ®bers (corresponding to theside tail ®bers of l), the ®rst accounting for themain structure of the ®bers and the second havinga role in assembly of the ®bers (Haggard-Ljungquist et al., 1992). The sequences within thesegenes, especially the ®rst group, are typicallymosaic within their coding sequences. HK97 genes28 and 29 ®t into this scheme. The product of gene28 makes convincing matches to the tail ®ber pro-teins of phages l (gpStf), P2 (gpH), and T4, and itsrelatives (gp37 and gp12), and to the ®ber-relatedproteins of defective prophage e14 (P-min) andplasmid p15B. Some of these matches cover onlypart of the gp28 sequence, in keeping with theintragenic mosaicism of other tail ®bers; a notableexample is in the comparison between HK97 gp28and the corresponding e14 gene product, in whichthe two amino acid sequences are 79 % identicalbetween residues 158 and 261 of gp28 and notdetectably related upstream from that region. Theproduct of gene 29 matches the tail ®ber assemblygenes of e14, p15B, l, and T4.

Page 16: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

42 Bacteriophage HK97 and HK022 Genomes

HK022 has three genes that are candidates fortail ®ber genes of the stf and tfa families, based ontheir map position; genes 25, 26, and 28. However,database searches do not give any match betweenthe products of genes 25 or 26 and any establishedtail ®ber proteins. (These genes do match the pro-ducts of the two corresponding genes of bacterio-phage N15, which are similarly located andarranged in that phage.) In contrast, HK022 gene28 does match HK97 gene 28, with 89 % aminoacid identity over the ®rst 136 codons. HK022apparently does not have a homolog of the l tfagene family.

Variation in the central region

A 5 kb region of sequence in the center of thephage l genome, de®ned by the classical b2 del-etion (Kellenberger et al., 1961), is dispensable for lgrowth in the laboratory. Early heteroduplex stu-dies (Simon et al., 1971) showed that some othermembers of the lambdoid family have comparableregions in the centers of their genomes that areapparently not homologous to the l b2 region.HK97 and HK022, in contrast, do not have such anextended b2 region. The tail ®ber genes and the sibregulatory element, which are separated in l by4980 bp, are separated in HK97 by only 228 bp.However, that 228 bp contains evidence thatHK97's ancestors may have had more extensivematerial in this region. Immediately to the left ofthe HK97 sib region (which is identical with the lsib region) is 109 bp of sequence that is 86 % identi-cal with a non-coding region in the left half of thel b2 region, such that the HK97 sequence couldhave been created from the l sequence in thisregion by a 3278 bp deletion starting immediatelyto the left of sib. Starting 17 bp to the left of the109 bp l homology is a segment with 49/51 bpidentity to a non-coding portion of the type IShiga-like toxin operon, and 52 bp to the left ofthis segment is the end of the last tail ®ber gene.HK022 has a similarly truncated region between itstail ®ber genes and its sib element; we have notdetected any database match to it.

Apparent loose organization in the left early operon

In the region of HK022 between the int and xisgenes on one side and the generalized recombina-tion genes (39-41) on the other, there is a series ofseven genes (32-38) of unknown function. Super®-cially, these appear to be ef®ciently organized,with little space or overlap between genes andwith plausible Shine-Dalgarno sequences and gen-erally good coding potential across their lengths.However, closer examination suggests that thesegenes may have undergone relatively recentrearrangements (see Figure 8). For example, thepredicted product of HK022 gene 35 makes a con-vincing sequence match to the carboxy-terminalhalf of the product of the P22 eaA gene (the eaAgene is in the corresponding part of the P22 gen-

ome); sequences corresponding to the amino-term-inal half of eaA are not present in HK022. Gene 35also matches a second ORF in P22 and a portion ofan ORF in each of phages 933W and VT2-Sa. Thepredicted product of HK022 gene 37 makes twoshort but convincing matches to the l Ea22 protein,a 24/26 amino acid residue match starting withresidue 40 of gp37 and a 24/24 base-pair matchcovering the last eight codons of the two genes.From the beginning of gene 37 up to where thematch to l Ea22 begins at codon 40, the predictedprotein matches the beginning of a protein ofunknown function encoded in the E. coli genome.Other regions of HK022 gp37 do not make detect-able matches to the databases. A similar intragenicmosaicism was noted (Wulff et al., 1993) betweenthe l Ea22 protein and the P22 EaD protein, whichmatch at a high level over their ®rst 37 codons andpoorly thereafter. In HK97, this region of the gen-ome is much shorter than in HK022, the HK97organization can be understood formally as havingarisen from HK022 by a deletion of 2.2 kb and asubstitution of 250 bp from some other source,with the result that HK97 gp37 matches HK022gp32 at its C-terminal end, HK022 gp37 at its N-terminal end, and something unrelated to HK022in its middle.

Another example occurs in HK97 immediatelydownstream of gene 37, in a 34 codon ORF (notlisted in Table 1). This ``gene'' starts with a goodShine-Dalgarno sequence and the ®rst 11 codonsare related (29/33 bp identity) to the ®rst 11codons of the l ea22 gene. This is followed by 18codons of unknown provenance, and the sequencethen enters a segment with 50/51 bp identity to aregion from within the coding region of the P22 xisgene. The HK97 34 codon ``gene'' enters this P22sequence in a different frame from that used in theP22 xis gene, and it terminates after ®ve codons.The impression given by these examples is thatthere is a substantial amount of reassortment ofsegments of coding regions among genes in thisregion. It seems unlikely, though not impossible,that all the different versions of the resulting pro-teins are fully functional, and we suggest thatmany of the genes in this region of the genome areon their way from and (possibly) on their way to afunctional state, and that we have caught them intransition. That notwithstanding, the l Ea22 andEa8.5 proteins, as well as several of the P22 pro-teins from the corresponding region, are expressedand are detectable during a phage infection(Hendrix, 1971; Youderian & Susskind, 1980).

The early regulatory region

HK97 gene 47 lies in the position expected for ahomolog of the l N gene, which encodes a tran-scription anti-termination protein. Its best match tothe available sequences (save for the two near-identities mentioned below) is a 38 % amino acidsequence identity between its predicted proteinand gp24 of P22, the P22 homolog of the l N pro-

Page 17: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 43

tein. Curiously, the termination codon of HK97gene 47, which corresponds in position to the ter-mination codon of P22 gene 24, is followed by 16sense codons plus a termination codon with clearsimilarity (14 of 16 amino acid identity) to the last17 codons of the phage 21 N gene. We believe it isunlikely that this fragment of a 21-like N gene pro-vides any advantage to HK97, and therefore wespeculate that it is the relic of an out-of-registerrecombination event, with little or no functionalsigni®cance. The corresponding ``N`` genes oflambdoid phages H19B (Neely & Friedman, 1998)and 933W (Plunkett et al., 1999) have the samearrangement as HK97 gene 47, including the 21-like gene fragment and nearly identical DNAsequence, suggesting that this gene and its associ-ated gene fragment have been distributed byrecombination among at least these three phages invery recent evolutionary time. Experiments withthe N gene of H19B show that it functions like thel N gene (Neely & Friedman, 1998); the great simi-larity between the H19B and HK97 N genesequences argues that the HK97 gene is also func-tionally an N gene. As described in detail else-where (Robert et al., 1987; Weisberg et al., 2000),the HK022 gene in this position is the nun gene,which functions somewhat similarly to N but tosubstantially different effect.

HK97 has the same immunity as l, and inaccord with this the cI repressor gene sequencesare nearly identical, differing in 17 nucleotides,which translates into two conservative amino acidchanges between the predicted proteins. The Croproteins are also predicted to be nearly identical,and the two operators to which both CI and Crobind differ by only 1 bp between the two phages.Figure 9 shows a dot matrix comparison of the

Figure 8. Intragenic mosaicism in the left early operon of Hbars represent sequence matches to the indicated phage orsimilarities detected in a BlastN search with default paramdetected in BlastP searches.

immunity regions of l and HK97. Note that thesequence similarity extends only over the tworepressor genes (cI and cro) and the two operatorson which they act (OL and OR); the ¯ankingregions are completely divergent. We take this torepresent co-segregation of interacting, co-evolvedsequences, presumably mediated by selection forimmunity function acting on diversity generatedby recombination.

The ``nin'' regions

Between the DNA replication genes (O and P) ofl and the regulator of late transcription (gene Q)lie ten genes of largely unknown function (thoughfunctions have been inferred for some of them(Holli®eld et al., 1987; Mahdi et al., 1996; Sawitzke& Stahl, 1994; Sharples et al., 1998)), which can bedeleted without serious effects on l growth in thelaboratory. This group of genes is referred to as the``nin'' region from the name of a deletion with anN-independent phenotype, nin5, which covers thisregion. These genes are remarkable for the regu-larity and ef®ciency with which they are packedtogether, with the termination codon of one genetypically overlapping the initiation codon of thenext gene (Kroger & Hobom, 1982). Comparisonwith the corresponding regions of HK022, HK97and P22 (Figure 10) shows that these phages havesimilar groups of genes, similarly tightly arranged,and that some but not all of the nin region genesof any one phage match those of any other mem-ber of the group. Despite their dispensability forlaboratory growth of l, these genes are clearlyunder positive selection, as can be seen by thepreservation of the ORFs from one phage toanother, as well as by the preservation of the

K022. The boxes represent HK022 genes and the coloredE. coli genomes. Red bars indicate nucleotide sequence

eters. Green bars indicate amino acid sequence matches

Page 18: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 9. Immunity region comparisons between l and HK97. The immunity regions of HK97 and l were com-pared in a dot matrix plot. Gene positions are indicated. The diagonal lines show regions where the sequencesmatch. The extents of the two operators are indicated with arrows.

44 Bacteriophage HK97 and HK022 Genomes

amino acid sequences encoded by them. In a com-parison of the ORFs 146 of l and HK97 (HK97gene 61; also known as gene orf in l), for example,there are 16 nucleotide differences between thesequences, but only two of those translate intochanges in amino acid sequence; this argues forselection at the level of protein function.

Another striking feature of these nin regioncomparisons is that each different phage has adifferent set of genes in this region, as if eachphage had drawn a set of about ten genes forthis region from a larger set of genes availableto it. By noting how often a particular gene

appears once, twice, or three times among thefour phages illustrated in Figure 9, and makingsome simplifying assumptions (equal frequencyof the different genes in the pool, random selec-tion of genes from the pool), we calculate thatthere must be �30-50 genes in the lambdoid ninregion gene pool, from which each phage``chooses'' about ten. Each individual set ofgenes must confer a particular set of capabilitieson the phage that carries it. Consequently, wepropose that a function of the nin region genesis to adapt the phage to a particular ecologicalniche.

Page 19: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Figure 10. Sequence comparisonsin the nin regions of lambdoidphages. The sources of the ®veDNA regions being compared areindicated at the right of the Figure;the GenBank accession number forthe E. coli sequence shown isL04539. Boxes represent genes andthe colored shading shows thelocations and degrees of sequencematches. With the exception of P22genes 12 and 23, the numbers inthe boxes indicate the length of thegene in codons. HK022 gene num-bers are shown at the top.

Bacteriophage HK97 and HK022 Genomes 45

Other noteworthy features of this region includethe facts that ORF 241 (gene 64) of HK97 andHK022 shares sequence similarity in its down-stream half with genes ant1 and kilA of phage P1(Yarmolinsky & Sternberg, 1988), and that ORF221 of l is a homolog of a mammalian proteinphosphatase (Cohen & Cohen, 1989). In the onlydeparture from regular juxtapositions of genes,HK97 and HK022, relative to l and P22, have anadditional 215 bp and 1462 bp, respectively, in theregion just to the left of their genes 61. Since theneat gene packing arrangement in l and P22 is dis-rupted in HK97 and HK022, it seems likely thatthe arrangements in HK97 and HK022 werederived by insertion into a l-like ancestor. (Forexample, HK022 gene 57 does not have a convin-cing Shine-Dalgarno sequence, and it overlaps thevery short upstream gene 56, which has a goodShine-Dalgarno sequence. This has the appearanceof a frameshift error in the sequence determination.However, our sequence data are redundant andunequivocal on both strands through this region,so our belief is that the ``error'' is in the actualsequence and not the sequence determination.)Finally, as Figure 10 shows, some of the nin regiongenes match some sequence from E. coli, which islocated about 1 kb upstream from a Shiga toxin-like operon. This sequence, which we presume tobe part of a prophage carrying the toxin operon,has a frameshift in the homolog of P22 orf58 andan insertion sequence in the middle of the homologof P22 orf110. This suggests to us that selectionpressure has been off the prophage genes thatwould function during lytic growth for long

enough that they have begun to accumulatemutations and become non-functional.

Discussion

Genome organization and structure

HK97 and HK022 are seen here to have genomeorganizations that are very similar to each otherand to the two other members of the family usedfor comparison, l and P22. This similarity includesboth the genetic functions that are present and theorder of genes along the genome. The similarity ofgenes and gene order presumably accounts for thefact that hybrids between members of this familycan be created easily, in that a single reciprocalrecombination event between corresponding pos-itions in two genomes of the family will produceprogeny with full sets of the essential genetic func-tions of the family. However, within the context ofthose organizational similarities, the phages can bevery different in sequence at any one position inthe genome. For the four phages compared here,for example, there are only four genes (the Q anti-terminator gene, the S holin gene, and two smallgenes in the nin region) that are present in essen-tially the same form (>90 % nucleotide sequenceidentity) in all four phages.

Inferences about evolution andpopulation structure

The comparisons of genome sequences wedescribe here show that these phages are genetic

Page 20: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

46 Bacteriophage HK97 and HK022 Genomes

mosaics which, in the context of a common geneorganization, can have any of a variety of more orless related sequences at any one functional pos-ition in the sequence. That much has been clearsince the comparisons of lambdoid genomes byDNA heteroduplex mapping 30 years ago (Simonet al., 1971). The more precise comparisons that aremade possible by the availability of complete gen-ome sequences allow us to decipher the nature ofthe mosaicism at a more detailed level. In particu-lar, the precise locations of the mosaic boundaries,which we take to be sites of ancestral illegitimaterecombination, are revealing. They are far fromrandom in their locations, with most mosaicboundaries falling at or very near gene boundaries,but not at every gene boundary; in a few cases, themosaic boundaries fall within coding regions. Themost economical hypothesis we are aware of toexplain the particular pattern of mosaicism we seeis that mosaic boundaries can occur where they donot disrupt function of the genetic material and notelsewhere. Thus few mosaic boundaries fall withinprotein coding regions, consistent with the viewthat a protein that is a mosaic of two structurallydistinct proteins (albeit two proteins with a com-mon function) is unlikely to be fully functional.When a mosaic boundary does fall within a codingregion it is often possible to identify the location ofthe boundary as coinciding with a domain bound-ary in the protein. Similarly, for genes whose pro-teins must interact intimately, for example thehead structural genes, there is typically no mosaicboundary within the group of genes whose proteinproducts are thought to interact closely. In sum,these genomes appear to be mosaics of functionalunits, where the functional units may be individualgenes, groups of genes whose products interactintimately, or portions of genes encoding a func-tional domain of the protein.

The locations of mosaic boundaries do not seemto be rigidly ®xed, however, and we do not sup-pose that the illegitimate recombination events thatgive rise to them are constrained to occur only atcertain sites. Rather, we suggest that such recombi-nation can occur virtually anywhere along thesequences of the phages but that only the minorityof recombination events that do not damage func-tion in the recombined genome survive to bedetected in our sequence comparisons. Support forthis view comes from the observations that there isno discernible sequence difference between siteswhere recombination is observed and where it isnot, that independent recombination events at thesame gene boundary can occur at locations separ-ated by several nucleotides, and that in regionsthat do not contain protein coding or otherobviously functionally important sequences, thereis often evidence of multiple recombination events.This is not to deny that ``microhomologies''between two sequences, such as could be providedby the common presence of Shine-Dalgarnosequences and initiation codons at the beginningsof genes, might bias the locations of recombination

events (Baker et al., 1991; Campbell, 1994). Notethat the fact that the mosaic boundaries can bedescribed as having resulted from a single illegiti-mate recombination event does not imply that theymust necessarily have arisen in that way. Theexample of the HK97 N gene described above mayrepresent an intermediate stage in a two-stepmechanism to generate a precise mosaic boundary:the current arrangement of this gene is most easilyexplained as the product of an out of registerrecombination that resulted in a tandem, non-iden-tical duplication of the end of the N gene (but onethat is evidently still functional). If this phage wereto suffer a small deletion just downstream of its Ngene, the resulting sequence would have theappearance of the product of a single, in-register,illegitimate recombination event.

The recombination events that give rise to themosaic boundaries could occur when a host cell issimultaneously infected by two lambdoid phages.However, it is likely that a much more frequentsource of such events is recombination between aninfecting phage and a resident prophage(Campbell, 1988). Given the widespread occurrenceof prophages, complete or partial, in bacterial gen-omes, it is likely that a lambdoid phage will ®ndan opportunity for recombination with prophagegenes in virtually any host it infects. Laboratoryexperiments on frequencies of illegitimate recombi-nation suggest that this is a relatively rare occur-rence, perhaps 107-fold less frequent thanhomologous recombination, but it has evidentlyoccurred many times over the evolutionary historyof the lambdoid phages. Homologous recombina-tion on the other hand occurs with high frequency,with a considerable number of the progeny of alambdoid phage infection being products of suchrecombination. Homologous recombination thusprovides a mechanism for the novel combinationsof genes (or groups of genes, or segments of genes)produced by illegitimate recombination to be reas-sorted with each other and spread through thepopulation at a high rate.

The genes of these phages are generally verytightly and ef®ciently organized, with little or nospace between genes. For the most part, in caseswhere there is space between genes, it is occupiedby known or suspected regulatory sequences suchas promoters, operators, and terminators. Thisdescription applies particularly to those genes andsites that are known to have important roles in thephage life-cycle, and we presume it re¯ects a longhistory of selection for those functions, imposed ondiversity generated by mutation and recombina-tion. There are, however, some regions of thesegenomes that contain relatively long stretches ofDNA with no apparent protein-coding or otherfunction, for example, much of the DNA at theright ends of the genomes beyond the lysis genes,and other regions, such as the region upstreamfrom the int and xis genes, for which comparisonsamong genomes suggest that the genes that areseen there are relatively recent composites of pre-

Page 21: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 47

viously existing genes. Although it is dif®cult to becertain, it appears likely that some or all of thisDNA is not providing a speci®c selective advan-tage to the phage. If so, the question arises whythis DNA persists in the phage genome, particu-larly in light of the observation that the majority ofthe genome gives the appearance of very parsimo-nious use of the DNA.

A striking feature of the sequence similaritiesamong lambdoid phages described here and else-where in the literature is that, not only do any twophages of the family share related sequences, butthat some of the sequences are very close indeed,97-98 % identical in nucleotide sequence or better.Thus, the N genes (plus their 21N-like appendix) ofHK97, 933W and H19B are >97 % identical in pair-wise comparisons; the cI, cro, and int genes oflambda and HK97 are 97-98 % identical; lambdaand 933W share �4kb of 97 % identical sequencein the region to the left of the immunity region(Plunkett et al., 1999); and HK97 and HK022 show�10 kb of >97 % identity in the region crossing thecos sites. These similarities argue that these phageshave been in genetic contact with each other inrecent evolutionary time; that is, within the time ittakes for their sequences to drift apart by 2-3 %.The fact that these phages show such high levels ofsimilarity argues that recent genetic communi-cation among the lambdoid group has taken placeover a geographical span including at least HongKong, North America and Europe (or South Amer-ica). We do not yet have a way to calibrate howlong ``recent evolutionary time'' actually is in thiscontext. However, we believe these data suggestthat on that time-scale, the lambdoid phages haveeffectively constituted a single genetic populationwith a global range.

Functional inferences

In addition to allowing inferences about evol-utionary mechanisms and about the structure ofbacteriophage populations, sequence comparisonsof the sort reported here make possible certaininferences about biological function and insightsinto other features of the sequences that could notbe derived from examining any single genomesequence alone. Perhaps the clearest example ofthis is in the inferences that can be drawn aboutwhich elements of a multi-component system inter-act directly with each other based on observingwhich genetic elements segregate together. Theexample shown in Figure 8, in which the HK97and l cI and cro genes co-segregate with the twooperators that the CI and Cro repressors act on,illustrates a correlation between co-segregation ofsequences and biochemical interaction for a systemthat is well characterized biochemically. We makesimilar suggestions above about the interaction ofthe two domains of the Erf protein with other com-ponents of the recombination apparatus and aboutthe structural and functional organization of thegpFII protein and its homologs in providing a

structural adapter between heads and tails of vir-ions. These suggestions will require testing bydirect experiment before they can be fully evalu-ated. Nonetheless, the availability of multiplegenomic sequences, and comparisons among them,have allowed us to make clear, speci®c, and testa-ble predictions about the biochemical behavior ofthese systems.

The morons in the tail gene region of HK97 andHK022 might have been evident without relatedsequences for comparison, but the availability ofcomparison genomes has emphasized their pre-sence and the possibility that they may be mobileelements. These elements have a putative promoterfollowed by a protein-coding region and a putativestem-loop (r-independent) transcription termin-ator. On this basis we expect that they should beexpressed from a repressed prophage. In the casefor which evidence is available on this point, thef80 cor gene, which is a homolog of the HK022gene 27 moron, expression does occur from theprophage. We suggest that, as is evidently true forthe cor moron, all of the morons may supply func-tions that confer a selective advantage on the host.In this sense, the morons may be a means bywhich the prophage pays rent to the host cell.

In l, the lom gene, which is in an analogous pos-ition in the gene order to that of cor in f80 andHK022, is followed by a stem-loop terminatorstructure. We have not been able to recognize apromoter sequence upstream from lom, but there isin fact presumptive evidence that lom is expressedfrom the prophage (Reeve & Shaw, 1979), whichargues that some part of the upstream sequencecan act as a promoter. The l bor gene, a virulencefactor that is oriented backwards to late lytic tran-scription and located near the beginning of the lateoperon, is also likely to be expressed from arepressed prophage (Barondess & Beckwith, 1990).In this case, we ®nd no recognizable promoter norterminator sequences, but if bor is in fact expressed,there must be promoter activity. The occurrence ofmorons may not be limited to the lambdoid familyof phages; for example, gene 32 of the Streptomycesphage fC31 is ¯anked by a promoter and termin-ator (Smith et al., 1999), it is expresed from theprophage (Howe & Smith, 1996), and it is absent inthe corresponding region of the otherwise verysimilar phage fBT1 (M. Smith, personal communi-cation). Finally, although they are by nowembedded in more complex regulatory circuits, therepressor genes of the lambdoid and other phages,together with their promoters and terminators,could be regarded as morons.

In addition to the stem-loop putative terminatorsfollowing the moron genes, there are seven othersuch potential stem-loop structures in HK97 andnine in HK022, plus three in l and two or three inP22. These ``free-standing'' stem-loops occupyspace between two genes, as do the stem-loopsthat are parts of morons. They differ from themoron case in that the gene immediately upstreamis one that appears to be a more universal and per-

Page 22: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

48 Bacteriophage HK97 and HK022 Genomes

manent part of the genome than is the case for themoron genes. A simple hypothesis for the origin ofthe free-standing stem-loops is that they mark aplace where there was a moron in the past but thatall of the moron except the stem-loop has been lostby deletion. Whether this is correct or whetherthe free-standing stem-loops have some other ori-gin, their presence in the phage genome arguesthat they provide some selective bene®t either tothe phage directly or to a cell carrying it as aprophage.

We have no direct evidence that the stem-loops function as transcription terminators. How-ever, since their oligo(U) stretches give them apolarity, we are able to say that they are alloriented in the same way with respect to thedirection of transcription; namely, in the direc-tion that would allow them to act as termin-ators. This is true as well of the stem-loopsassociated with the reverse-oriented morons ofgenes 23 of HK97 and HK022: although thestem-loops are backwards with respect to tran-scription from the late lytic promoter, they arein the correct orientation to terminate transcrip-tion from the moron promoter. We have enter-tained the possibility that the stem-loops havesome function related to transcription orexpression of mRNA other than termination; forexample, modulating the stability of the part ofthe mRNA with which they are associated.However, we consider this or other schemes inwhich the stem-loops have differential effects onmRNA function unlikely because the distributionof stem-loops among the head and tail genes ofthese phages is different for each phage, eventhough the roles in virion assembly and struc-ture of the homologous genes (and any specialrequirements they may have for regulation oftheir expression) are very likely the same for thedifferent phages. Consequently, we suggest thatthe role of the stem-loops is in fact termination.In the case of the moron-associated stem-loopsthis would prevent transcription of the moronfrom extending into the surrounding prophagegenes; in the case of the free-standing stem-loops, it would curtail inappropriate transcriptionof prophage genes whether it originated fromoutside or within the prophage DNA. We notethat it is possible for such transcription termin-ators to be located in these phages without inter-fering with lytic growth of the phage becausethe RNA polymerase that carries out lyticexpression of these genes has been acted on bythe late regulator (``Q'') protein with the effectthat it reads through termination signals. Thusin this sense, one consequence of having an anti-termination mechanism for lytic transcription isto allow the prophage to act as host to moronsand to carry other terminators that keep proph-age genes (except moron genes) silent.

An implication of this line of argument is that itis important for a prophage to maintain transcrip-tional silence (excepting moron genes). If so, a pre-

diction is that other temperate phages will alsohave a mechanism for maintaining transcriptionalsilence in the prophage. We argue elsewhere thatthis is true for temperate mycobacteriophages L5and D29 (Brown et al., 1997; Ford et al., 1998). L5 isnot known to use an anti-termination system, andit does not have terminators of the sort we ®nd inthe lambdoid phages. However, it does havenumerous repressor binding sites (``stoperators''),which cause abortion of transcription when repres-sor is bound. We believe that this has the sameeffect in L5 that we envisage for the terminators inthe lambdoid phages; namely, to stop undesirabletranscription from the repressed prophage but toallow transcription of those same sequences oncethe lytic cycle has been entered.

Materials and Methods

Bacteriophage strains

HK97 and HK022 were obtained as prophages ofE. coli K12 from Dr Tarlochan Dhillon in about 1982.DNA for sequencing HK97 was derived from virionsproduced by UV induction from the original lysogen.HK022 virions were made by infection of a liquid cultureusing a phage derived from the lysogen that had beenstored and used in the laboratory of Dr Robert Weisberg.

Gene numbering scheme

Genes were numbered in order from left to right onthe genome maps. Numbers were skipped on one phageor the other to allow genes that are homologs betweenthe two phages to have the same number. For genes thatare homologs of a well-characterized gene of l or P22(e.g. int, cI, erf), we propose that these names be used inpreference to the numbers, as is already the practice inthe literature.

Sequence determination

DNA sequence was determined using dideoxynucleo-tide chain terminators (Sanger et al., 1977). For HK97,sonicated and repaired phage genomic DNA was clonedinto M13mp18 and the sequence determined with single-stranded DNA templates and a universal primer. Pro-ducts were labeled by incorporation of [a-35S]thioATP,separated on a gel, and analyzed by exposing X-ray ®lmto the dried gel and reading the sequence manually asdescribed (Duda et al., 1995b). Individual sequences werealigned and edited in the GCG suite of programs (Gen-etics Computer Group, Madison, WI).

For HK022, genomic DNA was cut randomly withDNase I and cloned into the EcoRV site of pBluescriptSKÿ (Stratagene, La Jolla, CA), as described (Ford et al.,1998). Thermocycling sequencing reactions were carriedout with ¯uorescently labeled dideoxy terminators, andthe products were analyzed with an ABI 377 sequencer(Perkin-Elmer Corp., Applied Biosystems Div., FosterCity, CA). Individual sequences were aligned and editedin the program Sequencher (Gene Codes Corp., AnnArbor, MI). All parts of the sequences for both phageswere determined completely on both strands.

The structure of the ends of the mature DNA from vir-ions was determined by using synthetic oligonucleotides

Page 23: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 49

to prime thermocycling sequencing reactions in thedirections of the predicted positions of the ends. Com-parisons of the results with the sequence of the annealedends, derived from the main sequencing project, gavethe structures reported in the text. As a control, the struc-ture of the ends of phage l virion DNA was determinedin parallel and the expected results were obtained.

The N-terminal sequence of the HK022 major tail sub-unit was determined from the protein in the correspond-ing band of an SDS/polyacrylamide gel of puri®edvirions. The appropriate band was identi®ed, asdescribed earlier for HK97 (Popa et al., 1991), as the onlynon-head protein suf®ciently abundant to account forthe tail shaft structure. The band from the gel was trans-ferred to PVDF paper (BioRad, Hercules, CA) andstained with Coomassie brilliant blue dye. A paper stripcontaining the band was excised (LeGendre &Matsudaira, 1989), and N-terminal sequence analysiswas performed on a protein sequenator (Porton 2090E,Beckman Instruments, Inc., Fullerton, CA).

Sequence analysis

The sequences were compared to each other and toother sequences in the databases using the FASTAand TFASTA programs as implemented in the GCGpackage of programs and at both the nucleotide andamino acid levels using the gapped Blast and Blast 2programs available from the National Library of Medi-cine. GeneMark was used for initial identi®cation ofcoding regions.

Data Bank accession numbers

The sequences are available in GenBank under acces-sion numbers AF069529 (HK97) and AF069308 (HK022).

Acknowledgments

This work was supported by NIH grants GM47795and GM51975 to R.W.H. We thank Sherwood Casjens,Susan Godfrey, Jeffrey Lawrence, and Maggie Smith forthoughtful comments on the manuscript, and BruceStocker for information about the origins of P22.

References

Atkinson, B. L. & Gottesman, M. E. (1992). The Escheri-chia coli rpoB60 mutation blocks antitermination bycoliphage HK022 Q-function. J. Mol. Biol. 227, 29-37.

Baker, J., Limberger, R., Schneider, S. J. & Campbell, A.(1991). Recombination and modular exchange in thegenesis of new lambdoid phages. Nature New Biol.3, 297-308.

Barondess, J. J. & Beckwith, J. (1990). A bacterial viru-lence determinant encoded by lysogenic coliphagel. Nature, 346, 871-874.

Bergh, é., Bùrsheim, Y., Bratbak, G. & Heldal, M.(1989). High abundance of viruses found in aquaticenvironments. Nature, 340, 467-468.

Borodovsky, M. & McInich, J. D. (1993). GeneMark: par-allel gene recognition for both strands. Comput.Chem. 17, 123-133.

Botstein, D. (1980). A theory of modular evolution forbacteriophages. Ann. N.Y. Acad. Sci. 354, 484-490.

Brown, K. L., Sarkis, G. J., Wadsworth, C. & Hatfull,G. F. (1997). Transcriptional silencing by the myco-bacteriophage L5 repressor. EMBO J. 16, 5914-5921.

Campbell, A. (1988). Phage evolution and speciation.In The Bacteriophages (Calendar, R., ed.), vol. 1, pp.1-14, Plenum Publishing Co., New York.

Campbell, A. & Botstein, D. (1983). Evolution of thelambdoid phages. In Lambda II (Hendrix, R. W.,Roberts, J. W., Stahl, F. W. & Weisberg, R. A., eds),pp. 365-380, Cold Spring Harbor Laboratory, ColdSpring Harbor, NY.

Campbell, A. M. (1994). Comparative molecular biologyof lambdoid phages. Annu. Rev. Microbiol. 48, 193-222.

Casjens, S. (1974). Bacteriophage lambda FII gene pro-tein: role in head assembly. J. Mol. Biol. 90, 1-20.

Casjens, S. R., Hatfull, G. F. & Hendrix, R. W. (1992).Evolution of dsDNA tailed-bacteriophage genomes.Semin. Virol. 3, 383-397.

Cohen, P. T. & Cohen, P. (1989). Discovery of a proteinphosphatase activity encoded in the genome of bac-teriophage lambda. Probable identity with openreading frame 221. Biochem. J. 260, 931-934.

Dhillon, E. K., Dhillon, T. S., Lai, A. N. & Linn, S.(1980). Host range, immunity and antigenic proper-ties of lambdoid coliphage HK97. J. Gen. Virol. 50,217-220.

Dhillon, T. S. & Dhillon, E. K. (1976). Temperate coliph-age HK022. Clear plaque mutants and preliminaryvegetative map. Jpn. J. Microbiol. 20, 385-396.

Duda, R. L., Hempel, J., Michel, H., Shabanowitz, J.,Hunt, D. & Hendrix, R. W. (1995a). Structural tran-sitions during bacteriophage HK97 head assembly.J. Mol. Biol. 247, 618-635.

Duda, R. L., Martincic, K. & Hendrix, R. W. (1995b).Genetic basis of bacteriophage HK97 proheadassembly. J. Mol. Biol. 247, 636-647.

Ford, M. E., Sarkis, G. J., Belanger, A. E., Hendrix, R. W.& Hatfull, G. F. (1998). Genome structure of myco-bacteriophage D29: implications for phage evol-ution. J. Mol. Biol. 279, 143-164.

Gold, L. (1988). Post-transcriptional regualtory mechan-isms in E. coli. Annu. Rev. Biochem. 57, 199-233.

Haggard-Ljungquist, E., Halling, C. & Calendar, R.(1992). DNA sequences of the tail ®ber genes ofbacteriophage P2: evidence for horizontal transferof tail ®ber genes among unrelated bacteriophages.J. Bacteriol. 174, 1462-1477.

Hatfull, G. F. & Sarkis, G. J. (1993). DNA sequence,structure, and gene expression of mycobacterioph-age L5: a phage system for mycobacterial genetics.Mol. Microbiol. 7, 395-405.

Hendrix, R. W. (1971). Identi®cation of proteins codedin phage lambda. In The Bacteriophage Lambda(Hershey, A. D., ed.), pp. 355-370, Cold SpringHarbor Laboratory, Cold Spring Harbor, NY.

Hendrix, R. W. & Duda, R. L. (1992). BacteriophagelPaPa: not the mother of all l phages. Science, 258,1145-1148.

Hendrix, R. W. & Duda, R. L. (1998). BacteriophageHK97 head assembly: a protein ballet. In Advancesin Virus Research (Maramorsch, K., Murphy, F. A. &Shatkin, A. J., eds), vol. 50, pp. 235-288, AcademicPress, New York.

Holli®eld, W. C., Kaplan, E. N. & Huang, H. V. (1987).Ef®cient RecABC-dependent, homologous recombi-nation between coliphage lambda and plasmidsrequires a phage ninR region gene. Mol. Gen. Genet.210, 248-255.

Page 24: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

50 Bacteriophage HK97 and HK022 Genomes

Howe, C. W. & Smith, M. C. (1996). Gene expression inthe cos region of the Streptomyces temperate acti-nophage fC31. Microbiology, 142, 1357-1367.

Katsura, I. (1987). Determination of bacteriophagelambda tail length by a ruler. Nature, 327, 73-75.

Katsura, I. & Hendrix, R. W. (1984). Length determi-nation in bacteriophage lambda tails. Cell, 39, 691-698.

Katsura, I. & KuÈ hl, P. W. (1975). Morphogenesis of thetail of bacteriophage lambda. III. Morphogeneticpathway. J. Mol. Biol. 91, 257-273.

Kellenberger, G., Zichichi, M. L. & Weigle, J. (1961). Amutation affecting the DNA content of bacterio-phage lambda and its lysogenizing properties.J. Mol. Biol. 3, 399.

Kroger, M. & Hobom, G. (1982). A chain of interlinkedgenes in the ninR region of bacteriophage lambda.Gene, 20, 25-38.

Lawrence, J. G. & Ochman, H. (1997). Amelioration ofbacterial genes: rates of change and exchange. J. Mol.Evol. 44, 383-397.

Lederberg, E. M. (1951). Lysogenicity in E. coli K-12.Genetics, 36, 560.

LeGendre, N. & Matsudaira, P. T. (1989). Puri®cation ofproteins and peptides by SDS-PAGE. In A PracticalGuide to Protein and Peptide Puri®cation for Microse-quencing (Matsudaira, P. T., ed.), pp. 49-57,Academic Press, New York.

Levin, M. E., Hendrix, R. W. & Casjens, S. R. (1993). Aprogrammed translational frameshift is required forthe synthesis of a bacteriophage lambda tail assem-bly protein. J. Mol. Biol. 234, 124-139.

Lilleengen, K. (1948). Typing Salmonella typhimurium bymeans of bacteriophage. Acta Path. Microb. Scand.Suppl. 77.

Lucchini, S., Desiere, F. & BruÈ ssow, H. (1999). Compara-tive genomics of Streptococcus thermophilus phagespecies supports a modular evolution theory.J. Virol. 73, 8647-8656.

Mahdi, A. A., Sharples, G. J., Mandal, T. N. & Lloyd,R. G. (1996). Holliday junction resolvases encodedby homologous rusA genes in Escherichia coli K-12and phage 82. J. Mol. Biol. 257, 561-573.

Matsumoto, M., Ichikawa, N., Tanaka, S., Morita, T. &Matsushiro, A. (1985). Molecular cloning of f80adsorption-inhibiting cor gene. Jpn. J. Genet. 60, 475-483.

Monod, C., Repoila, F., Kutateladze, M., Tetart, F. &Krisch, H. M. (1997). The genome of the pseudo T-even bacteriophages, a diverse group that resemblesT4. J. Mol. Biol. 267, 327-249.

Murphy, K. C., Casey, L., Yannoutsos, N., Poteete, A. R.& Hendrix, R. W. (1987a). Localization of a DNA-binding determinant in the bacteriophage P22 Erfprotein. J. Mol. Biol. 194, 105-117.

Murphy, K. C., Fenton, A. C. & Poteete, A. R. (1987b).Sequence of the bacteriophage P22 anti-recBCD(abc) genes and properties of P22 abc region deletionmutants. Virology, 160, 456-464.

Neely, M. N. & Friedman, D. I. (1998). Functional andgenetic analysis of regulatory region of coliphageH19-B: location of shiga-like toxin and lysis genessuggest a role for phage functions in toxin release.Mol. Microbiol. 28, 1255-1267.

Oberto, J., Weisberg, R. A. & Gottesman, M. E. (1989).Structure and function of the nun gene and theimmunity region of the lambdoid phage HK022.J. Mol. Biol. 207, 675-693.

Plunkett, G., III, Rose, D. J., Durfee, T. J. & Blattner, F. R.(1999). Sequence of Shiga toxin 2 phage 933W fromEscherichia coli O157:H7: Shiga toxin as a phagelate-gene product. J. Bacteriol. 181, 1767-1778.

Popa, M. P., McKelvey, T. A., Hempel, J. & Hendrix,R. W. (1991). Bacteriophage HK97 structure: whole-sale covalent cross-linking between the major headshell subunits. J. Virol. 65, 3227-3237.

Poteete, A. R. & Volkert, M. R. (1988). Activation ofRecF-dependent recombination in Escherichia coli bybacteriophage lambda- and P22-encoded functions.J. Bacteriol. 170, 4379-4381.

Poteete, A. R., Sauer, R. T. & Hendrix, R. W. (1983).Domain structure and quaternary organization ofthe bacteriophage P22 Erf protein. J. Mol. Biol. 171,401-418.

Reeve, J. N. & Shaw, J. E. (1979). Lambda encodes anouter membrane protein: the lom gene. Mol. Gen.Genet. 172, 243-248.

Robert, J., Sloan, S. B., Weisberg, R. A., Gottesman,M. E., Robledo, R. & Harbrecht, D. (1987). Theremarkable speci®city of a new transcriptiontermination factor suggests that the mechanisms oftermination and antitermination are similar. Cell, 51,483-492.

Sanger, F., Nicklen, S. & Coulson, A. R. (1977). DNAsequencing with chain terminating inhibitors. Proc.Natl Acad. Sci. USA, 74, 5463-5467.

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F. &Petersen, G. B. (1982). Nucleotide sequence of bac-teriophage lambda DNA. J. Mol. Biol. 162, 729-773.

Sawitzke, J. A. & Stahl, F. W. (1994). The phage l orfgene encodes a trans-acting factor that suppressesEscherichia coli recO, recR, and recF mutations forrecombination of l but not of E. coli. J. Bacteriol.176, 6730-6737.

Sharples, G. J., Corbett, L. M. & Graham, I. R. (1998). lRap protein is a structure-speci®c endonucleaseinvolved in phage recombination. Proc. Natl Acad.Sci. USA, 95, 13507-13512.

Simon, M. N., Davis, R. W. & Davidson, N. (1971). Het-eroduplexes of DNA molecules of lambdoid phages:physical mapping of their base sequence relation-ships by electron microscopy. In The BacteriophageLambda (Hershey, A. D., ed.), pp. 313-328, ColdSpring Harbor Laboratory, Cold Spring Harbor,NY.

Smith, M. C. M., Burns, R. N., Wilson, S. E. & Gregory,M. A. (1999). The complete genome sequence of theStreptomyces temperate phage fC31: evolutionaryrelationships to other viruses. Nucl. Acids Res. 27,2145-2155.

Staden, R. (1986). The current status and portability ofour sequence handling software. Nucl. Acids Res. 14,217-231.

Susskind, M. M. & Botstein, D. (1978). Molecular gen-etics of bacteriophage P22. Microbiol. Rev. 42, 385-413.

Weisberg, R. A., Gottesman, M. E., Hendrix, R. W. &Little, J. W. (2000). Family values in the age ofgenomics: Comparative analyses of temperate bac-teriophage HK022. Annu. Rev. Genet. 33, 565-602.

Werts, C., Michel, V., Hofnung, M. & Charbit, A. (1994).Adsorption of bacteriophage lambda on the LamBprotein of Escherichia coli K-12: point mutations ingene J of lambda responsible for extended hostrange. J. Bacteriol. 176, 941-947.

Page 25: Genomic Sequences of Bacteriophages HK97 and HK022 ......Genome sequence comparisons: genetic mosaicism Ancestral recombination sites Figure1showsthegenomemapofHK97, together with

Bacteriophage HK97 and HK022 Genomes 51

Wommack, K. E. & Colwell, R. R. (2000). Viroplankton:viruses in aquatic ecosystems. Microbiol. Mol. Biol.Rev. 64, 69-114.

Wulff, D. L., Ho, Y. S., Powers, S. & Rosenberg, M.(1993). The int genes of bacteriophages P22 andlambda are regulated by different mechanisms. Mol.Microbiol. 9, 261-271.

Yagil, E., Dolev, S., Oberto, J., Kislev, N., Ramaiah, N. &Weisberg, R. A. (1989). Determinants of site-speci®crecombination in the lambdoid coliphage HK022.

An evolutionary change in speci®city. J. Mol. Biol.207, 695-717.

Yarmolinsky, M. & Sternberg, N. (1988). BacteriophageP1. In The Bacteriophages (Calendar, R., ed.), vol. 1,pp. 291-438, Plenum Press, New York.

Youderian, P. & Susskind, M. M. (1980). Identi®cation ofthe products of bacteriophage P22 genes, includinga new late gene. Virology, 107, 258-269.

Zinder, N. & Lederberg, J. (1952). Genetic exchange inSalmonella. J. Bacteriol. 64, 679-699.

Edited by M. Gottesman

(Received 20 December 1999; received in revised form 21 March 2000; accepted 24 March 2000)