Nucleotide Escherichia - PNASProc. Natl. Acad. Sci. USA Vol. 91, pp. 11276-11280, November1994...

5
Proc. Natl. Acad. Sci. USA Vol. 91, pp. 11276-11280, November 1994 Evolution Nucleotide polymorphism in colicin El and Ia plasmids from natural isolates of Escherichia coli MARGARET A. RILEY*, YING TAN, AND JINPING WANG Department of Biology, Yale University, New Haven, CT 06511 Communicated by Francisco J. Ayala, June 20, 1994 (received for review December 8, 1993) ABSTRACT We examined DNA sequence polymorphism for the colicin gene clusters of seven ColEl and six ColIa plasmids obtained from natural Isolates of Escherichia coli. These gene clusters harbor levels of nucleotide diversity rang- ing from 0.006 (ColIa) to 0.054 (ColEl). This level of diversity is similar to that observed for chromosomally encoded E. coli genes. However, the variance assocated with these estimates is severalfold higher for the plasmid-encoded genes. This increased variance may be due to the differing plamid population sizes. The pattern of colicin gene custer polymorphism suggests that the two colicins are evolving in different fashin. CoEl accu- mulates polymorphism at an elevated rate in the central domain of the clicin protein, while ColIa polymorphism is distributed evenly along the gene duster. Comparison of the patterns of divergence between colicin and reiated proteins of Cola and Tb and patterns of polymorphism within ColIa suggest that this gene cluster is not evolving in a neutral fashion. These data lend support to the hypothesis that colicin gene clusters may evolve under the influence of diversifying selection. Several studies have focused recently on the levels and patterns of nucleotide sequence polymorphism in natural isolates of Escherichia coli. These studies have examined a wide array of chromosomally determined loci as well as a smaller sample of insertion sequences (1-13). The goal is to deduce the evolutionary mechanisms that generate genotypic diversity in bacteria. Such mechanisms include recombina- tion, genetic drift, and natural selection. Natural isolates of E. coli harbor an average of four plasmids ranging in size from a few to >200 kb (14). Plasmids encode functions required in the control of their replication and often encode genes ensuring their stable inheritance. In addition, some encode functions useful to their bacterial hosts, such as antibiotic and heavy metal resistance, toxin and bacteriocin production, and restriction modification sys- tems (15, 16). Colicin plasmids (Col plasmids) of E. coli serve as the focus of this work. Colicins are toxic proteins produced by and active against E. coli and related bacteria. The genes encod- ing colicins and colicin-related proteins, such as the immunity protein, which provides specific immunity to the action of a colicin, and the lysis protein, which is involved in cell lysis, are encoded exclusively on plasmid replicons (17, 18). Col plasmids are found at high frequencies in natural populations of E. coli (19). In this paper, we report the entire nucleotide sequence of the colicin El gene cluster, including the colicin, immunity, and lysis genes, for seven ColEl plasmids and the entire nucleotide sequence of the colicin Ia gene cluster, including the colicin and immunity genes, for six ColIa plasmids isolated from natural strains of E. coli. We also provide the colicin El and Ia gene cluster sequences of laboratory standard ColEl (pColEl-K30) and ColIa (pAPBZ106) plas- A coilciF f i__ S c _ _~m B Golicjr! 'munt V FIG. 1. Genetic organization of ColEl (A) and Colla (B) gene clusters. Boxes indicate colicin, immunity, and lysis genes within a gene cluster. Connecting lines indicate intergenic and flanking se- quences. mids (20).t The levels of plasmid-encoded DNA sequence variability are compared to those described for chromoso- mally determined loci from the same collection of E. coli isolates (1-13). In addition, the phylogenetic relationships of the E. coli hosts, their ColEl or ColIa plasmids, and the encoded colicin gene clusters are compared. These data allow us to further address the hypothesis that the diversity of colicin gene clusters in natural populations of E. coli is the result of positive, diversifying selection (21, 22). MATERIALS AND METHODS Phamid Isolates. From the E. coli reference (ECOR) col- lection (23), we isolated 7 strains that carry the ColEl plasmid and 5 strains that carry the ColIa plasmid (19). The 12 host strains, which have been analyzed by multilocus enzyme electrophoresis (38 loci) (24), are as follows: ColEl, EC12, EC24, EC31, EC39, EC40, EC50, and EC71; ColIa, EC3, EC14, EC15, EC28, and EC34. One further ColIa-harboring strain (IHE3113) was isolated from the Achtman collection (25). In addition, the ColEl (pColEl-K30) and ColIa (pA- PBZ106) plasmids from the Pugsley colicin plasmid collection were included in this study (20). Nucleotide Sequencing. Double-stranded DNA was pre- pared for each ColEl plasmid, which are --6 kb, by the alkaline lysis miniprep method (26). This DNA was used as a template for double-stranded DNA sequencing by the dideoxynucleotide chain-termination method supplied with Sequenase (United States Biochemical). The colicin Ia gene cluster, which is encoded on a high molecular weight plasmid of -100 kb, was amplified from genomic DNA with PCR primers designed based on the published Colla gene cluster sequence (27) as follows: 5' pnmer, 5'-GGAATTCTCTTGA- CATGCCATTTTCTCCTT-3' (bp 826-846 from ref. 27); 3' primer, 5'-GGAATTCCCGCCACATCTTTTTGCTGTC- CA-3'(bp 3358-3336 from ref. 27). PCR fragments were subcloned into M13 and single-stranded template was pre- pared and sequenced by the dideoxynucleotide chain- Abbreviations: df, degrees of freedom; ECOR, E. coli reference; MLEE, multilocus enzyme electrophoresis; RFLP, restriction frag- ment length polymorphism. *To whom reprint requests should be addressed. tThe sequences reported in this paper have been deposited in the GenBank data base (accession nos. U15619-U15633). 11276 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on April 2, 2021

Transcript of Nucleotide Escherichia - PNASProc. Natl. Acad. Sci. USA Vol. 91, pp. 11276-11280, November1994...

  • Proc. Natl. Acad. Sci. USAVol. 91, pp. 11276-11280, November 1994Evolution

    Nucleotide polymorphism in colicin El and Ia plasmids fromnatural isolates of Escherichia coliMARGARET A. RILEY*, YING TAN, AND JINPING WANGDepartment of Biology, Yale University, New Haven, CT 06511

    Communicated by Francisco J. Ayala, June 20, 1994 (received for review December 8, 1993)

    ABSTRACT We examined DNA sequence polymorphismfor the colicin gene clusters of seven ColEl and six ColIaplasmids obtained from natural Isolates of Escherichia coli.These gene clusters harbor levels of nucleotide diversity rang-ing from 0.006 (ColIa) to 0.054 (ColEl). This level of diversityis similar to that observed for chromosomally encoded E. coligenes. However, the variance assocated with these estimates isseveralfold higher for the plasmid-encoded genes. This increasedvariance may be due to the differing plamid population sizes.The pattern of colicin gene custer polymorphism suggests thatthe two colicins are evolving in different fashin. CoEl accu-mulates polymorphism at an elevated rate in the central domainof the clicin protein, while ColIa polymorphism is distributedevenly along the gene duster. Comparison of the patterns ofdivergence between colicin and reiated proteins ofCola and Tband patterns of polymorphism within ColIa suggest that thisgene cluster is not evolving in a neutral fashion. These data lendsupport to the hypothesis that colicin gene clusters may evolveunder the influence of diversifying selection.

    Several studies have focused recently on the levels andpatterns of nucleotide sequence polymorphism in naturalisolates of Escherichia coli. These studies have examined awide array of chromosomally determined loci as well as asmaller sample of insertion sequences (1-13). The goal is todeduce the evolutionary mechanisms that generate genotypicdiversity in bacteria. Such mechanisms include recombina-tion, genetic drift, and natural selection.

    Natural isolates of E. coli harbor an average of fourplasmids ranging in size from a few to >200 kb (14). Plasmidsencode functions required in the control of their replicationand often encode genes ensuring their stable inheritance. Inaddition, some encode functions useful to their bacterialhosts, such as antibiotic and heavy metal resistance, toxinand bacteriocin production, and restriction modification sys-tems (15, 16).

    Colicin plasmids (Col plasmids) ofE. coli serve as the focusof this work. Colicins are toxic proteins produced by andactive against E. coli and related bacteria. The genes encod-ing colicins and colicin-related proteins, such as the immunityprotein, which provides specific immunity to the action of acolicin, and the lysis protein, which is involved in cell lysis,are encoded exclusively on plasmid replicons (17, 18). Colplasmids are found at high frequencies in natural populationsof E. coli (19).

    In this paper, we report the entire nucleotide sequence ofthe colicin El gene cluster, including the colicin, immunity,and lysis genes, for seven ColEl plasmids and the entirenucleotide sequence of the colicin Ia gene cluster, includingthe colicin and immunity genes, for six ColIa plasmidsisolated from natural strains of E. coli. We also provide thecolicin El and Ia gene cluster sequences of laboratorystandard ColEl (pColEl-K30) and ColIa (pAPBZ106) plas-

    A coilciFf i__ S c_ _~m

    B Golicjr!

    'munt V

    FIG. 1. Genetic organization of ColEl (A) and Colla (B) geneclusters. Boxes indicate colicin, immunity, and lysis genes within agene cluster. Connecting lines indicate intergenic and flanking se-quences.

    mids (20).t The levels of plasmid-encoded DNA sequencevariability are compared to those described for chromoso-mally determined loci from the same collection of E. coliisolates (1-13). In addition, the phylogenetic relationships ofthe E. coli hosts, their ColEl or ColIa plasmids, and theencoded colicin gene clusters are compared. These data allowus to further address the hypothesis that the diversity ofcolicin gene clusters in natural populations of E. coli is theresult of positive, diversifying selection (21, 22).

    MATERIALS AND METHODSPhamid Isolates. From the E. coli reference (ECOR) col-

    lection (23), we isolated 7 strains that carry the ColEl plasmidand 5 strains that carry the ColIa plasmid (19). The 12 hoststrains, which have been analyzed by multilocus enzymeelectrophoresis (38 loci) (24), are as follows: ColEl, EC12,EC24, EC31, EC39, EC40, EC50, and EC71; ColIa, EC3,EC14, EC15, EC28, and EC34. One further ColIa-harboringstrain (IHE3113) was isolated from the Achtman collection(25). In addition, the ColEl (pColEl-K30) and ColIa (pA-PBZ106) plasmids from the Pugsley colicin plasmid collectionwere included in this study (20).

    Nucleotide Sequencing. Double-stranded DNA was pre-pared for each ColEl plasmid, which are --6 kb, by thealkaline lysis miniprep method (26). This DNA was used asa template for double-stranded DNA sequencing by thedideoxynucleotide chain-termination method supplied withSequenase (United States Biochemical). The colicin Ia genecluster, which is encoded on a high molecular weight plasmidof -100 kb, was amplified from genomic DNA with PCRprimers designed based on the published Colla gene clustersequence (27) as follows: 5' pnmer, 5'-GGAATTCTCTTGA-CATGCCATTTTCTCCTT-3' (bp 826-846 from ref. 27); 3'primer, 5'-GGAATTCCCGCCACATCTTTTTGCTGTC-CA-3'(bp 3358-3336 from ref. 27). PCR fragments weresubcloned into M13 and single-stranded template was pre-pared and sequenced by the dideoxynucleotide chain-

    Abbreviations: df, degrees of freedom; ECOR, E. coli reference;MLEE, multilocus enzyme electrophoresis; RFLP, restriction frag-ment length polymorphism.*To whom reprint requests should be addressed.tThe sequences reported in this paper have been deposited in theGenBank data base (accession nos. U15619-U15633).

    11276

    The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

    Dow

    nloa

    ded

    by g

    uest

    on

    Apr

    il 2,

    202

    1

  • Evolution: Riley et al. Proc. Natl. Acad. Sci. USA 91 (1994) 11277

    5 ' col111111111111111111 222222333333333334444444445

    7888112244555555667778 0156880122225568901234557907089562329023479010192 924337810169092654681319613GGTCGCTGGGTCAGTTATTAAA**CGATCTCAAACTTATAAG GGAGCT................ ................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .AAGGATAAAACGTA...CATTT**ATGCAATTCTTCCCAGGAACATACATGAAGGA.AAAACGTACGGCATTT**ATGCAATTCTTCCCAGGAACATACATGAAGGA.AAAACGTACGGCATTT**ATGCAATTCTTCCCAGGAACATACATGAAGGATAAAACGTA..GCATTT**ATGCAATTCTTCCCAGGAACATGAAGGATAAAACGTA..GCATTT**ATGCAATTCTTCCCAGGAACATACATAAGGATAAAACGTA..GCATTT**ATGCAATTCTTCCCAGGAACATACATG

    T KN T

    555555555555555555555555666666677777777777777778888012222223333333344578889338888912244666667778991112921235670123568925450179580345005706145673792152372TCACGACGTTCACACAGTTGACTGAATACGAGACATATAAATACCATAGTA.................... CA ........................ ...

    ...............CA ..... .........................TACATGGAGTTTTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCG

    CTCACCATGGAGTTTTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCGCTCACCATGGAGTTTTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCGCTCACCATGGAGTTTTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCGCTCACCATGGAGTTrTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCGCTCACCATGGAGTTTTCGCACAGATGAGAACAGACAGCCGGGGGATGGCCG

    RRTT SS A T E D RK A KK I KS VVMsTpH aE V I D A KQ D qR V NA LAV

    888888888999999999999999999000000000000000000000011234456899001123555666667788001123333444444555689900992919939027822678267894703016973457046789125782935GGTTTCTACGTACCCGCTCAAACAAACACAAGTTGGCAGATTGAAAAAGAA...................................................

    ATGACTCTGTCGTATAAGTCTTTTrGGGTGCTCGAATGTGAGTGGGGCATGATGACTCTGTCGTATAAG¶TCTTTTGGGTGCTCGAATGTGAGTGGGGCATG

    ATGACTCTGTCGTATAAGT~tTrTTTGGGTGCTICGAATGTGAGTGGGGCATGATGACTCTGTCGTATAAGTCTrrTTGGGTGCTCGAATGTGAGTGGGGCATGATGACTCTGTCGTATAAGTCTTTTTGGGTGCTCGAATIGTGAGTGGGGCATGATGACTCTGTCGTATAAGTCTTTTTGGGTGDCTCGAATGTGAGTGGGGCATGGK S MK A A LL KK N RV G KKII R IR NEN T TN S D mR eL S LA R DnvE M LQ S

    111111111111111111111111111111111111111111111111111111111111111111111111111111222222223333334444444444001112222444555666666777899012224770357892233445555672790178457479135679026436433697784100271439560127

    El TGTCCAGGATCGTrAGCGTACGTGAATAATTATGAGGGCATCTTTGTCATGEC71 ...................................................

    ...................................................

    CTGAGGAAGCTAGAGAACAGTTAATr.CCGCGGATAAAAGAACACACAGGTCTGAGGAAGCTAGAGAACAGTTAATTACCGCGGAAAAAAGAACACACAGGTCTGAGGAAGCTAGAGAACAGTTAATTACCGCGGAAAAAAGAACACACAGGTCTGAGGAAGCTAGAGAACAGTTAATT.CCGCGGAAAAAAGAACACACAGGTCTGAGGAAGCTAGAGAACAGTTAATT.CCGCGGAAAAAAGAACACACAGGTCTGAGGAAGCTAGAGAACAGTTAATT.CCGCGGAAAAAAGAACACACAGGTADTQQ KV SN NA G I A IEENKN G A SSENeG EA NK ST A V S VDDKTT S E A

    I-11111111111111111111**111111111111112222222**22**2244556666666666777777**788888889999990000011**11**1267181222345588011266**936679991223580025670**22**8468765258525628656003**085643679673854659652**12**74TAGACAGGGCCTTTACTAAG**AATAGCTCGGCAAAAACTCTG**CC**GC.................... .................*G ................ **.......AGAGTTATATTGCGCTGTGT**CGC.ATCTCAAGGGGTTCTCA**TT**TTAGAGTTAT.TTGCGCTGTGT**CGCTATCTCAAGGGGTTCTCA**TT**TTAGAGTTAT.TTGCGCTGTGT**CGCTATCTCAAGGGGTTCTCA**TT**TTAGAGTTAT.TTGCGCTGTGT**CGC.ATCTCAAGGGGTTCTCA**TT**TTAGAGTTAT.TTGCGCtGTGT**CGC.ATCTCAAGGGGTTCTCA**TT**TTAGAGTTATATTGCGCTGTGT**CGC.ATCTCAAGGGGTTCTCA**TT**TTD A VA L SY V REENAAY V STE V IV V AF E KRKKFSH A NA

    lye 3122222222**2222222222422222222222222222222222222222222222222**3333333333333333333333333333344444444445544555688**0445666667777778888888999999900000111160157069913**06943678901238901356890134589123490349440ACACCCGG**TTCAAGCTACCAAGGTTGTTCCGGTCAAGTACCCTTTATAG

    .........

    GGTATTTT**CCTGG.GGGGAGGTTGACAAGTTAGTCTTGGGGGCAAGCTAGGTATTTT**CCTGGTGGGGAGGTTGACAAGTTAGTCTTGGGGGCAAGCTAGGTATTTT**CCTGGTGGGGAGGTTGACAAGTTAGTCTTGGGGGCAAGCTAGGTATTTT**CCTGGTGGGGAGGTTGACAAGTTAGTCTTGGGGGCAAGCTAGGTATT**CCTGGTGGGGAGGTTGACAAGTTAGTCTTGGGGCAAGCTAGGTATTI* CTGOGAGTAAGTATTGGGAGTI AV S

    2222222222222555555555688813334467785665578675055123

    El TTCGGTTGCTATAEC71 .............EC31 .............EC24 CCAAACCTTCGG.EC39 CCAAACCTTC .GG

    CCAAACCTTC . GGCCAAACCTTCGG.CCAAACCTTCGG.CCAAACCTTCGG.

    termination method as described in the Sequenase manual(United States Biochemical). Oligonucleotide primers forcolicin gene clusters of ColEl and ColIa were constructed at-250-bp intervals based on available DNA sequence infor-mation (27, 28).Computer Analysis. DNA sequence data were assembled

    and edited with MACVECTOR programs [MACVECTOR 4.0.(1992), IBI]. Phylogenetic analysis was conducted by theneighbor-joining method (29).

    RESULTSNucleotide Polymorphism. DNA sequences of the colicin,

    immunity, and lysis genes were determined for seven naturalisolates of the ColEl plasmid of E. coli and a ColEl labora-tory standard (pColEl-K30) (20). These sequences werecompared to the previously published ColEl gene clustersequence (designated El) (28). The organization ofthe regionsequenced is given in Fig. 1A.There are 309 polymorphic sites in the 2855-bp region

    examined among the nine ColEl plasmids (Fig. 2; Table 1).ColEl plasmids isolated from E. coli host strains EC31 andEC71 share nearly identical DNA sequences with the previ-ously published ColEl gene cluster sequence (28). Thesestrains differ, on average, at 2.7 nt and have an averagenucleotide diversity (or base-pair heterozygosity) (30) of0.001. The colicin gene clusters from the rem=i ColElplasmids (isolated from host strains EC12, -24, -39, -40, and-50 and pColEl-K30) differ at an average of 4.9 nt and havean average nucleotide diversity of0.002. Thus, the two ColElplasmid groups have nearly equal levels of within-groupdiversity. However, they are easily distinguished, with anaverage of 153 nt differing in the total sample of sequencesand an average nucleotide diversity of 0.054. The total levelof diversity for ColEl gene clusters is 50-fold higher than thelevel observed within either group.DNA sequences of the colicin and immunity genes were

    determined for six natural isolates of the Colla plasmid of E.coli and a laboratory standard (pAPBZ106) (20). These se-quences were compared to the previously published ColIagene cluster sequence (designated Ia) (27). The organizationof the region sequenced is given in Fig. 1B.There are 43 polymorphic sites in the 2533-bp region

    examined (Fig. 3; Table 1). Collaplasmids differ, on average,at 15.21 nt and have an average nucleotide diversity of0.006.Three ColIa plasmids are clearly more closely related (basedon shared polymorphisms) relative to the remaning Collaplasmids. This subgroup consists of ColIa plsmids isolatedfrom hosts EC3, EC28, EC34 and the previously publishedColIa sequence (27). These plasmids differ, on average, at2.83 nt and have an average nucleotide diversity of0.001. Theremaining sequences, isolated from hosts EC14, EC15,IHE3113, and the standard Colla plasmid (pAPBZ106) (20),differ, on average, at 14.83 nt and have an average nucleotidediversity of 0.006.The total level of diversity observed among the ColIa

    sequences is 9-fold lower than the total level of diversityobserved among the ColEl sequences. However, within thetwo groups of ColEl sequences and the isolated group ofColIa sequences, the levels of diversity are not significantlydifferent [G = 2.836; degrees of freedom (dW) = 2; P > 0.30].Synonymous and Nonsynonymous Polymiorpisms. Table 1

    summarizes the levels of noncoding, synonymous, and non-synonymous polymorphism for each region sequenced inColEl and ColIA and provides an estimate of nucleotidediversity for synonymous (Ks) and nonsynonymous (Kn)

    FIG. 2. Polymorphic nucleotides and amino acids among theColEl gene clusters and encoded proteins. Numberingofnucleotidesis given above DNA sequences. Boldface lettering indicates theregion. Asterisks distinguish between coding/noncoding sequences.

    ElEC71EC31EC24EC39EC40EC12EC50pColEl-k30

    ElEC71EC31EC24EC39EC40EC12EC50pColEl-k30

    ElEC7 1EC31EC24EC39EC40EC12EC50pColEl-k30

    EC31EC24EC39EC40EC12EC5OpColEl-k30

    colelEC71EC31EC24EC39EC40EC12EC50pColEl-k30

    ElEC71EC31EC24EC3 9EC40EC12EC50pColEl-k30

    EC40EC12EC50pColEl-k30

    Dow

    nloa

    ded

    by g

    uest

    on

    Apr

    il 2,

    202

    1

  • Proc. Natl. Acad. Sci. USA 91 (1994)

    Table 1. Nucleotide polymorphism and diversity in the ColEl and ColIa gene clusters

    Total Total Total % Syn Syn Syn Nonsyn Nonsyn NonsynRegion sites poly poly KT sites poly % poly Ks sites poly % poly Kn

    ColEl5' 150 22 15.67 0.070 ± 0.03col 1566 200 12.77 0.060 ± 0.02 333 124 37.24 0.180 ± 0.08 1233 76 6.16 0.03 ± 0.01imm 339 21 6.19 0.030 ± 0.01 62 10 16.13 0.080 ± 0.01 277 11 3.97 0.02 ± 0.01inter 47 2 4.26lys 135 10 7.41 0.040 ± 0.91 31 8 25.81 0.130 ± 0.05 104 2 1.92 0.01 ± 0.003' 618 54 8.74 0.040 ± 0.02

    ColIa5' 104 3 2.88 0.007 ± 0.003col 1881 33 1.75 0.006 ± 0.002 577 16 2.77 0.009 ± 0.004 1304 17 1.30 0.004 ± 0.002inter 21 0 0.00imm 336 6 1.75 0.008 ± 0.003 69 4 5.80 0.029 ± 0.012 267 2 0.75 0.003 ± 0.0013' 191 1 0.52 0.003 ± 0.001Regions compared: 5', 5'untranslated region; col, colicin gene; imm, immunity gene; inter, intergenic region; lys, lysis gene; 3', 3'untranslated

    region. Total sites designates number of base pairs examined; total poly (polymorphines) designates number of polymorphic base pairs; total% poly designates percentage of base pairs polymorphic; KT indicates mean nucleotide diversity (number of substitutions per site); syn,synonymous; nonsyn, nonsynonymous.

    sites for each gene. For the El colicin, immunity, and lysisgenes, there are significant differences in estimates of Ks (G= 12.54; df = 2; P < 0.01) but not Kn (G = 5.78, df = 2; P< 0.10). The El immunity gene has a reduced level of Ksrelative to the colicin and lysis genes. Estimates ofKs and Knare not significantly different among the la colicin andimmunity genes (synonymous: G = 1.56; df = 1; P > 0.2;nonsynonymous; G = 0.65; df = 1; P > 0.30).

    Distribution of Polymorphic Amino Acids in the Colicin,Immunity, and Lysis Proteins. The distribution of amino acidpolymorphisms for each protein is given in Figs. 2 and 3.Amino acid polymorphisms are not distributed evenly alongthe El colicin protein when the sequence is divided into nineequally sized blocks (G = 52.79; df = 8; P 0.05). The total level of aminoacid polymorphism is 5-fold higher in ColEl versus ColIa.The synonymous polymorphisms are evenly distributed in

    both colicin genes (El, G = 14.96; df = 8; P > 0.05; Ia, G =11.48; df = 8; P > 0.1). Synonymous and nonsynonymouspolymorphic sites are distributed evenly along the El and Iaimmunity genes (El: nonsynonymous, G = 7.73; df = 3; P >0.05; synonymous, G = 0.17, df = 3; P > 0.98; Ia: nonsyn-onymous, G = 2.81; df= 3; P > 0.30; synonymous, G = 6.00;df = 3; P > 0.10).The Kn/Ks ratio for each protein is given in Table 2. This

    ratio indicates the relative degree of functional constraint

    IaEc28Ec3EC34Ec14Ec15IHE3 113pABZ106

    5 cOl iA 3,11111111111111111112222222222222**222223 **3

    889**900011111222344677990011222566667**899990**1770**915715779125923358386612016304584**723498**7361**494124459450143943914671479826112**921044**3CTT**ATCTTGACGTGTCGTCACGGAAGAATACCACGA**TCCTCC**A

    ... . . . * * . .. . . . C ..**........... ....C..........** .C ... . . . . . ... A . . C ..T ....*. T .** ..**

    .. ... T.C C. C CG GTT.**.TTC.A**T..C** CT C.C CG... GrrG**.TTC.A**T.C.**. .AAC.TG. .C.TC. .GTC. . .T.CG. .GGTT.**GTT. .A**TA. .**G CA. .AC. T. .C.CG.GCGG ..GTT.**.TT ..A**T

    H TS NN G A PSR R SKRAT M M ER NT iK A V LTQ G ARWGA V L D

    FIG. 3. Polymorphic nucleotides and amino acids among ColIagene clusters and encoded proteins. Numbering of nucleotides isgiven aboveDNA sequences. Boldface lettering indicates the region.

    experienced by a protein if it is evolving in a predominantlyneutral fashion. Under this assumption, the ColEl lysisprotein experiences an elevated level of functional constraintrelative to all other proteins examined here. Furthermore, thedegree of constraint inferred for the colicin and immunityproteins is in the opposite direction in ColEl and Colla.Comparison of Levels of Polymorphism between Chromo-

    somal andPaid Genes. The ECOR collection has served asa focus of DNA sequence polymorphism surveys of chro-mosomally encoded genes. More than nine protein-encodingloci have been examined (1-13) and the level of Ks variesbetween 0.044 and 0.079 (1-13), with a mean value of 0.057(Table 2). The level ofsynonymous variation observed for theCol plasmid-encoded genes ranges from 0.009 (in the Iaimmunity gene) to 0.18 (in the El colicin gene), with a meanvalue of 0.066 (Table 2). The mean values of Ks for chro-mosomal and plasmid-encoded genes are not significantlydifferent (x2 = 0.006; df = 1; P > 0.95). In contrast, thevariances around the means are significantly different (plas-mid s2 = 0.004; chromosomal S2 = 0.0002; F = 26.69; P <0.01). The variances are compared by a test of homoscedas-ticity with the logarithm of transformed values of mean Ksvalues. We can further distinguish the two classes of plasmid-encoded genes-i.e., ColEl versus Colla. The mean value ofKs for ColEl-encoded genes (0.13) does not differ signifi-cantly from the mean for chromosomal genes (x2 = 0.037; df= 1; P > 0.9). and the variance associated with theseestimators does not differ significantly (F = 3.439; P = 0.27).

    Table 2. Comparison of nucleotide diversity amongchromosomal- and plasmid-encoded genes

    Gene Ks Kn/KsPlasmid-encoded genes*

    El col 0.180 0.17El Imm 0.080 0.25El lys 0.130 0.08lacol 0.009 0.44Ia imm 0.029 0.10Chromosomal-encoded genest

    ceMC 0.049 0.03crr 0.051 0.00gutB 0.044 0.13phoA 0.071 0.05trpB 0.049 0.00trpC 0.079 0.05

    *This study.tFrom Hall and Sharp (11).

    11278 Evolution: Riley et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    Apr

    il 2,

    202

    1

  • Proc. Natl. Acad. Sci. USA 91 (1994) 11279

    AECOR Host MLEE Tree ColEl Plasmid RFLP Tree

    39,40,6371

    31

    pCoIEl-30

    12,24,50

    Dstance

    .l0 0.10

    BECOR Host MLEE Tree

    -1=41 15| 8

    |34

    Distance

    rm '0 0.10

    Distance

    0 10.0

    CoIEl Sequence Tree

    El7131

    50

    - 12_ pCoIEl-k30_ 3940

    _ 24

    Dlance

    0 0.05

    Colla Sequence Tree

    a

    14

    15

    IHE3113pABZ106

    Distance

    0 0.05

    FIG. 4. Evolutionary trees, constructed by the neighbor-joiningmethod (29) on the basis of polymorphic nucleotides, are shown forColEl (A) and ColIa (B) gene clusters, together with comparabletrees summarizing the genomic relationships of the ColEl and ColIahost strains from the E. coli reference (ECOR) collection, as indexedby multilocus enzyme electrophoresis, and the ColEl plasmid rela-tionships, as indexed by restriction fragment length polymorphism(RFLP) analysis. MLEE, multilocus enzyme electrophoresis.

    The mean value of Ks for ColIa-encoded genes (0.018) alsodoes not differ significantly from the mean for chromosomalgenes (X2 = 0.019; df = 1; P> 0.9). However, in this case thevariances are significantly different (F = 11.273; P < 0.05).Evoluona Relationship Among Host Stains and Their

    Plsmhb. Phylogenetic trees, constructed by the neighbor-joining method (29) on the basis of polymorphic nucleotides,are shown in Fig. 4 for ColEl and ColIa gene clusters,together with comparable trees summarizing the genomicrelationships of the ColEl and ColIa host strains, as indexedby multilocus enzyme electrophoresis (MLEE) (24), and theColEl plAsmid relationships, as indexed by restriction frag-ment length polymorphism analyses (RFLP) (19). Thebranching patterns of the host-, plasmid-, and colicin cluster-based trees for El and for the host- and colicin cluster-basedtrees for Ia show numerous incongruencies. ECOR referencecollection strain pairs, Col plasmid pairs, and colicin genecluster pairs EC12 and -24, EC39 and 40, and EC31 and -71cluster together in each of the three trees. However, therelationships among these three pairs of hosts, plasmids, andcolicin gene clusters are different when compared among thethree trees. For example, the branching pattern in the hosttree indicates that EC12 and -24 are well separated from theremaining hosts. In the plasmid tree, EC50 clusters tightlywith EC12 and -24. In the sequence tree, EC39, -40, and -50cluster tightly with EC12 and -24. These incongruencies mayindicate instances of horizontal transfer, positive selection,or both, in the evolution of ColEl and Colla gene clusters.

    DISCUSSIONTwo groups of ColEl gene clusters are distinguished in thisstudy, differing at 5.4% of their nucleotides. Within eachgroup, the level of variation is sharply reduced, with

  • Proc. Nati. Acad. Sci. USA 91 (1994)

    to the ancestral colicin while evolving an additional immunityfunction, one that confers immunity to the evolved colicin.The ancestral colicin, however, is not immune to the evolvedcolicin. Thus, the unique variant will have a "super" immu-nity function and be rapidly driven into the population at theexpense of the ancestral colicin during times when colicino-genicity is selectively favored.

    In this scenario of colicin diversification, two mutations-i.e., one in the immunity gene and one in the colicin gene-are required. However, these mutations do not have to occursimultaneously. Ifthe first mutation occurred in the immunitygene, it would be either selectively neutral or, possibly,selectively favored because of the increased spectrum ofimmunity such a mutation would confer. It is worth notingthat naturally occurring Col plasmids often have additionalimmunity genes linked to their colicin cluster (31). Further-more, immunity mutants have been characterized that conferimmunity to several different colicins with no apparentdeleterious effect (32).Repeated rounds of this form of diversifying selection are

    predicted to result in elevated levels ofboth synonymous andnonsynonymous divergence in the immunity gene and in theimmunity binding region of the colicin gene when closelyrelated colicins are examined (21, 22). The elevated level ofnonsynonymous divergence is due to the positive selectionfor unique mutations in the immunity protein and the bindingregion of the colicin protein. As described above, theseimmunity-function mutations are selected as pairs. Recom-bination that breaks the link between the pairs is lethal.Mutations that occur between the selected pairs or closelylinked to the pairs will be dragged to fixation during theselective sweeps. This hitchhiking effect also elevates thelevel of synonymous variation. The pattern of inflated diver-gence of both synonymous and nonsynonymous substitutionwas observed in all three pairs of colicin gene clusters(E3/E6, E2/E9, and Ia/ib) for which the colicin gene clusterscan be reliably aligned.A prediction for a neutrally evolving locus is that the level

    of polymorphism should be positively correlated with thelevel of divergence. For ColIa, the synonymous and non-synonymous polymorphisms are evenly distributed along thecolicin gene cluster. This pattern is quite different from thatobserved between ColIa and ColIb, where the most rapiddivergence occurs in a narrow region encompassing thecolicin binding region and the immunity gene. As there are nocolicins closely related to ColEl, this sort of comparisoncannot be made. However, it should be noted that the patternof polymorphism observed at ColEl is quite different fromthat observed at ColIa. For ColEl, the synonymous poly-morphism is evenly distributed along the gene cluster, whilethe nonsynonymous polymorphism is clustered in the centralregion of the colicin gene.

    Several statistical tests have been developed that allow oneto compare the patterns of polymorphism and divergence tothose predicted for a neutrally evolving locus (33, 34). One ofthese (34) compares the ratio of Ks/Kn to the ratio of synon-ymous and nonsynonymous divergence. Ifthe ratios were thesame, then one cannot reject the null hypothesis of neutrality.We have applied this test to the ColIa colicin gene by usingpolymorphism data for the colicin gene and divergence dataobtained from a comparison of the Ia and lb colicin genes. Wefail to detect a departure from neutrality (G = 0.34; df = 1; P> 0.5). Given the high level of divergence in the immunitygenes ofIa and lb, we are unable to test the immunity gene fordepartures from neutral predictions. Unfortunately, it is pre-cisely the immunity region that is predicted to experience thestrongest positive selection forces (21, 22).Two questions are raised by this study. First, does the

    observed discrepancy between the patterns of polymorphism

    and divergence for Colla support the hypothesis of diversi-fying selection acting on the immunity region of certaincolicin gene clusters (22)? Due to the high level ofdivergenceobserved between Colla and ColIb immunity genes, it may benecessary to examine patterns of DNA sequence polymor-phism in more closely related colicin pairs-e.g., ColE3 andColE6 (21, 22). Second, why does ColEl accumulate poly-morphism in the central domain of the colicin protein at anelevated rate relative to the remainder of the protein? Thispattern was unexpected given results from a previous studythat examined features of colicin protein evolution. Thisprevious work indicated that, generally, the central domain ofthe colicin protein is more highly constrained in evolutionthan the remainder ofthe protein (21), exactly the opposite ofwhat is observed for the accumulation of polymorphism inthe colicin El protein. Additional surveys of Col plasmidpolymorphism are required to determine whether this Elpattern is a general feature of the evolution of Col geneclusters or a pattern restricted to the evolution of ColEl.

    This study was supported by National Institutes of Health GrantGM47471-02 to M.A.R. and a grant from the General ReinsuranceCo. to M.A.R.1. Milkman, R. & Crawford, I. P. (1983) Science 221, 378-380.2. Hartd, D. L., Medhora, M., Green, L. & Dykhuzien, D. E. (1986)

    Philos. Trans. R. Soc. London B 312, 191-204.3. DuBose, R. F., Dykuizen, D. E. & Hartd, D. L. (1988) Proc. Natl.

    Acad. Sci. USA 85, 7036-7040.4. Milkman, R. & Stoltzfus, A. (1988) Genetics 120, 359-366.5. Stoltzfus, A., Leslie, J. F. & Milkman, R. (1988) Genetics 120,

    345-358.6. Milkman, R. & Bridges, M. M. (1990) Genetics 126, 505-517.7. Milkman, R. & Bridges, M. M. (1990) Genetics 12, 518-532.8. Bisercic, M., Feutrier, J. Y. & Reeves, P. R. (1991) J. Bacteriol.

    173, 3894-3900.9. Dykhuizen, D. E. & Green, L. (1991) J. Bacteriol. 173, 7257-7268.

    10. Nelson, K., Whittam, T. S. & Selander, R. K. (1991) Proc. Natd.Acad. Sci. USA 88, 6667-6671.

    11. Hall, B. G. & Sharp, P. M. (1992) Mol. Biol. Evol. 9, 654-665.12. Lawrence, J. G., Ochman, H. & Hard, D. L. (1992) Genetics 131,

    9-20.13. Nelson, K. & Selander, R. K. (1992) J. Bacteriol. 174, 6886-6895.14. Selander, R. K., Caugant, D. A. & Whittam, T. S. (1987) in Esch-

    erichia coli and Salmonella typhimurium: Cellular and MolecularBiology, eds. Neidhardt, F. C., Ingraham, J. L., Low, K. B.,Magasanik, B., Schaechter, M. & Umbarger, H. E. (Am. Soc. forMicrobiol., Washington, DC), Vol. 2, pp. 1625-1648.

    15. Broda, P. (1979) Plasmids (Freeman, San Francisco).16. Reanney, D. (1976) Bacteriol. Rev. 40, 552-590.17. Konisky, J. (1982) Annu. Rev. Microbiol. 36, 125-144.18. Pugsley A. (1984) Microbiol. Sci. 1, 168-175, 203-205.19. Riley, M. A. & Gordon, D. M. (1992) J. Gen. Microbiol. 138,

    1345-1352.20. Pugsley, A. (1985) J. Gen. Microbiol. 131, 369-376.21. Riley, M. A. (1993) Mol. Biol. Evol. 10, 1380-1395.22. Riley, M. A. (1993) Mol. Biol. Evol. 10, 1048-1059.23. Ochman, H. & Selander, R. K. (1984) J. Bacteriol. 157, 690-693.24. Whittam, T. S., Ochman, H. & Selnader, R. K. (1983) Proc. Natd.

    Acad. Sci. USA 80, 1751-1755.25. Achtman, M., Mercer, A., Kusecek, B., Pohi, A., Heuzenroeder,

    M., Aaronson, W., Sutton, A. & Silver, R. P. (1983) Infect. Immun.39, 315-335.

    26. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982). MolecularCloning: A Laboratory Manual (Cold Spring Harbor Lab. Press,Plainview, NY).

    27. Mankovich, J., Hsu, C. & Konisky, J. (1986) J. Bacteriol. 168,228-236.

    28. Chan, P. T., Ohmori, H., Tomizawa, J. & Lebowitz, J. (1985) J.Biol. Chem. 260, 8925-8935.

    29. Saitou, N. & Nei, M. (1987) Mol. Biol. Evol. 4, 406-425.30. Nei, M. (1987) Molecular Evolutionary Genetics (Columbia Univ.

    Press, New York).31. Chak, K. F. & James, R. (1984) J. Gen. Microbiol. 130, 701-710.32. Masaki, H., Akutsu, A., Uozumi, T. & Ohta, T. (1991) Gene 107,

    133-138.33. Tajima, F. (1989) Genetics 123, 597-601.34. McDonald, J. H. & Kreitman, M. (1991) Nature (London) 351,

    652-654.

    11280 Evolution: Riley et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    Apr

    il 2,

    202

    1