Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA,...

8
JOURNAL OF BACTERIOLOGY, Nov. 2008, p. 7060–7067 Vol. 190, No. 21 0021-9193/08/$08.000 doi:10.1128/JB.01552-07 Copyright © 2008, American Society for Microbiology. All Rights Reserved. Molecular Phylogeny of the Salmonellae: Relationships among Salmonella Species and Subspecies Determined from Four Housekeeping Genes and Evidence of Lateral Gene Transfer Events J. R. McQuiston, 1,2 S. Herrera-Leon, 3 B. C. Wertheim, 4 ‡ J. Doyle, 1 ‡ P. I. Fields, 2 R. V. Tauxe, 1,2 and J. M. Logsdon, Jr. 1,4 * Program in Population Biology, Ecology and Evolution, Emory University, Atlanta, Georgia 30322 1 ; Division of Foodborne, Bacterial and Mycotic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 30333 2 ; Laboratorio Nacional de Referencia de Salmonella y Shigella, Centro Nacional de Microbiologia, Instituto de Salud Carlos III, 28220 Majadahonda, Madrid, Spain 3 ; and Department of Biology, Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa City, Iowa 52242 4 Received 26 September 2007/Accepted 15 August 2008 The salmonellae are a diverse group of bacteria within the family Enterobacteriaceae that includes two species, Salmonella enterica and Salmonella bongori. In order to characterize the phylogenetic relationships of the species and subspecies of Salmonella, we analyzed four housekeeping genes, gapA, phoP, mdh and recA, comprising 3,459 bp of nucleotide sequence data for each isolate sequenced. Sixty-one isolates representing the most common serotypes of the seven subspecies of Salmonella enterica and six isolates of Salmonella bongori were included in this study. We present a robust phylogeny of the Salmonella species and subspecies that clearly defines the lineages comprising diphasic and monophasic subspecies. Evidence of intersubspecies lateral gene transfer of the housekeeping gene recA, which has not previously been reported, was obtained. The salmonellae are gram-negative gammaproteobacteria and members of the family Enterobacteriaceae that are often patho- genic to humans (8). They are intestinal parasites and intracellu- lar pathogens in many mammalian hosts but are found in many other hosts, including birds, reptiles, amphibians, and plants. Nontyphoidal salmonellae cause an estimated 1.4 million cases of salmonellosis in the United States each year, including 15,000 hospitalizations and 400 deaths annually (37). The taxonomic classification and nomenclature of the Sal- monella have been controversial for decades. Some clarity was obtained with the judicial opinion (no. 80) by the Judicial Commission of the International Committee on Systematics of Prokaryotes (12). The classification used in this study follows that proposed by Le Minor and Popoff (16), Reeves et al. (27), and Tindall et al. (36), the last classification being a combina- tion of the two former methods. The nomenclature reflects the differentiation of the Salmonella subspecies based on pheno- typic traits, such as carbon source utilization. This has also been validated to a considerable extent by DNA-DNA hybrid- ization (5). Subspecies determination is performed by the pres- ence or absence of 11 biochemical traits (18). Currently, the Salmonella are divided into two species, Salmonella enterica and Salmonella bongori (8, 12, 28). Salmonella enterica is fur- ther divided into six subspecies that were categorized by Tin- dall et al. (36) as follows: Salmonella enterica subsp. enterica (subsp. I), Salmonella enterica subsp. salamae (subsp. II), Sal- monella enterica subsp. arizonae (subsp. IIIa), Salmonella en- terica subsp. diarizonae (subsp. IIIb), Salmonella enterica subsp. houtenae (subsp. IV), and Salmonella enterica subsp. indica (subsp. VI). Subspecies VII was described by Boyd et al. (2) by multilocus enzyme electrophoresis (MLEE) data. However, this subspecies is not identifiable by unique biochemical prop- erties. The group originally identified as subsp. V—Salmonella subsp. bongori—is now recognized as the separate species Sal- monella bongori (27). In this study, we represent the S. enterica subspecies with Roman numerals (i.e., I to VII). In addition to the taxonomic classification of subspecies, the salmonellae are further subdivided by serotype using a subtyp- ing method based on two surface structures, the O antigen of the lipopolysaccharide and the flagellar or H antigen. This method has been invaluable to understanding the epidemiol- ogy of Salmonella. The combination of the subspecies, 46 O groups, and 114 H antigens accounts for all recognized sero- types of Salmonella (23, 24). The most frequently encountered subspecies is Salmonella en- terica subsp. I. Found primarily in mammals, this subspecies is the most common cause of human disease (4). The other six subspe- cies of Salmonella enterica, as well as Salmonella bongori, are found primarily in nonhuman hosts and cause only occasional disease in humans. Of the 2,541 total serotypes, 1,504 are in Salmonella enterica subsp. I (24). Of the reported 36,183 Salmo- nella isolates reported to the national Salmonella surveillance system in 2005, approximately 1% of infections annually are due to subspecies of Salmonella other than subsp. I (4). Many salmonellae, but not all, express two independent yet coordinately regulated flagellin loci (fliC and fljB) with distinctive protein and antigenic structures. This expression of two separate antigens is unique to Salmonella and was recognized before the nature of flagella was known (13) and described as “phases.” Thus, salmonellae possessing the capacity to express two antigens * Corresponding author. Mailing address: Department of Biology, Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa City, IA 52242. Phone: (319) 335-1082. Fax: (319) 335-1069. E-mail: [email protected]. † Supplemental material for this article may be found at http://jb .asm.org/. ‡ B.C.W. and J.D. contributed equally to this study. Published ahead of print on 29 August 2008. 7060 on March 24, 2020 by guest http://jb.asm.org/ Downloaded from

Transcript of Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA,...

Page 1: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

JOURNAL OF BACTERIOLOGY, Nov. 2008, p. 7060–7067 Vol. 190, No. 210021-9193/08/$08.00�0 doi:10.1128/JB.01552-07Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Molecular Phylogeny of the Salmonellae: Relationships among SalmonellaSpecies and Subspecies Determined from Four Housekeeping Genes

and Evidence of Lateral Gene Transfer Events�†J. R. McQuiston,1,2 S. Herrera-Leon,3 B. C. Wertheim,4‡ J. Doyle,1‡ P. I. Fields,2

R. V. Tauxe,1,2 and J. M. Logsdon, Jr.1,4*Program in Population Biology, Ecology and Evolution, Emory University, Atlanta, Georgia 303221; Division of Foodborne, Bacterial and

Mycotic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 303332; Laboratorio Nacional de Referencia deSalmonella y Shigella, Centro Nacional de Microbiologia, Instituto de Salud Carlos III, 28220 Majadahonda, Madrid, Spain3; andDepartment of Biology, Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa City, Iowa 522424

Received 26 September 2007/Accepted 15 August 2008

The salmonellae are a diverse group of bacteria within the family Enterobacteriaceae that includes twospecies, Salmonella enterica and Salmonella bongori. In order to characterize the phylogenetic relationships ofthe species and subspecies of Salmonella, we analyzed four housekeeping genes, gapA, phoP, mdh and recA,comprising 3,459 bp of nucleotide sequence data for each isolate sequenced. Sixty-one isolates representing themost common serotypes of the seven subspecies of Salmonella enterica and six isolates of Salmonella bongori wereincluded in this study. We present a robust phylogeny of the Salmonella species and subspecies that clearlydefines the lineages comprising diphasic and monophasic subspecies. Evidence of intersubspecies lateral genetransfer of the housekeeping gene recA, which has not previously been reported, was obtained.

The salmonellae are gram-negative gammaproteobacteria andmembers of the family Enterobacteriaceae that are often patho-genic to humans (8). They are intestinal parasites and intracellu-lar pathogens in many mammalian hosts but are found in manyother hosts, including birds, reptiles, amphibians, and plants.Nontyphoidal salmonellae cause an estimated 1.4 million cases ofsalmonellosis in the United States each year, including 15,000hospitalizations and 400 deaths annually (37).

The taxonomic classification and nomenclature of the Sal-monella have been controversial for decades. Some clarity wasobtained with the judicial opinion (no. 80) by the JudicialCommission of the International Committee on Systematics ofProkaryotes (12). The classification used in this study followsthat proposed by Le Minor and Popoff (16), Reeves et al. (27),and Tindall et al. (36), the last classification being a combina-tion of the two former methods. The nomenclature reflects thedifferentiation of the Salmonella subspecies based on pheno-typic traits, such as carbon source utilization. This has alsobeen validated to a considerable extent by DNA-DNA hybrid-ization (5). Subspecies determination is performed by the pres-ence or absence of 11 biochemical traits (18). Currently, theSalmonella are divided into two species, Salmonella entericaand Salmonella bongori (8, 12, 28). Salmonella enterica is fur-ther divided into six subspecies that were categorized by Tin-dall et al. (36) as follows: Salmonella enterica subsp. enterica(subsp. I), Salmonella enterica subsp. salamae (subsp. II), Sal-

monella enterica subsp. arizonae (subsp. IIIa), Salmonella en-terica subsp. diarizonae (subsp. IIIb), Salmonella enterica subsp.houtenae (subsp. IV), and Salmonella enterica subsp. indica(subsp. VI). Subspecies VII was described by Boyd et al. (2) bymultilocus enzyme electrophoresis (MLEE) data. However,this subspecies is not identifiable by unique biochemical prop-erties. The group originally identified as subsp. V—Salmonellasubsp. bongori—is now recognized as the separate species Sal-monella bongori (27). In this study, we represent the S. entericasubspecies with Roman numerals (i.e., I to VII).

In addition to the taxonomic classification of subspecies, thesalmonellae are further subdivided by serotype using a subtyp-ing method based on two surface structures, the O antigen ofthe lipopolysaccharide and the flagellar or H antigen. Thismethod has been invaluable to understanding the epidemiol-ogy of Salmonella. The combination of the subspecies, 46 Ogroups, and 114 H antigens accounts for all recognized sero-types of Salmonella (23, 24).

The most frequently encountered subspecies is Salmonella en-terica subsp. I. Found primarily in mammals, this subspecies is themost common cause of human disease (4). The other six subspe-cies of Salmonella enterica, as well as Salmonella bongori, arefound primarily in nonhuman hosts and cause only occasionaldisease in humans. Of the 2,541 total serotypes, 1,504 are inSalmonella enterica subsp. I (24). Of the reported 36,183 Salmo-nella isolates reported to the national Salmonella surveillancesystem in 2005, approximately 1% of infections annually are dueto subspecies of Salmonella other than subsp. I (4).

Many salmonellae, but not all, express two independent yetcoordinately regulated flagellin loci (fliC and fljB) with distinctiveprotein and antigenic structures. This expression of two separateantigens is unique to Salmonella and was recognized before thenature of flagella was known (13) and described as “phases.”Thus, salmonellae possessing the capacity to express two antigens

* Corresponding author. Mailing address: Department of Biology,Roy J. Carver Center for Comparative Genomics, University of Iowa,Iowa City, IA 52242. Phone: (319) 335-1082. Fax: (319) 335-1069.E-mail: [email protected].

† Supplemental material for this article may be found at http://jb.asm.org/.

‡ B.C.W. and J.D. contributed equally to this study.� Published ahead of print on 29 August 2008.

7060

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

are termed “diphasic” and capable of “phase variation” with re-spect to their flagellar antigen. The expression of these two loci isregulated by a switch mechanism (hin) so that only one variety offlagellin protein is expressed at a time (33). This diphasic char-acteristic, however, is limited to four of the Salmonella entericasubspecies (I, II, IIIb, and VI), whereas subspecies IIIa, IV, andVII, as well as Salmonella bongori, have only one flagellin locus(fliC) and are considered “monophasic.” Specific serotypes withinthe diphasic subspecies can also be monophasic.

A variety of methods have been used to examine the phyloge-netic history of Salmonella in previous studies. In 1973, Crosa etal. (5) used DNA disassociation by DNA-DNA hybridization todefine the species and subspecies of Salmonella and differentiatethem from other members of the Enterobacteriaceae. Two morerecent studies have helped clarify the phylogeny of the Salmonellasubspecies. Boyd et al. (2) defined the relationships of Salmonellabased on MLEE and DNA sequence analysis of housekeepingand invasion genes. Porwollik et al. (26) used microarray analysisof gene presence/absence to compare Salmonella subspecies andserotypes. These studies, presented in summary in Fig. 1a to d,resulted in similar conclusions but with some notable exceptions.The phylogeny based on MLEE data (Fig. 1a) conflicted at manypoints with the DNA sequence-based phylogenies (Fig. 1b and d)and a more recent phylogeny based on microarray data (Fig. 1c).The MLEE data grouped the diphasic (i.e., containing two flagel-

lin loci) subsp. II with subsp. IV and VII, both of which aremonophasic; this study also groups subsp. IIIa (monophasic) withI, IIIb, and VI, which are all generally diphasic. If correct, thistopology would require multiple acquisitions or losses of the sec-ond flagellin locus. The sequence-based study of the invasiongenes (Fig. 1d) and housekeeping genes (Fig. 1b) divided themonophasic subspecies from the diphasic subspecies; however,these trees have slight topology differences. The microarray studyby Porwollik et al. (26) is in close agreement with the housekeep-ing gene data set; however, one prominent difference is the rela-tionship of subsp. IIIa to the diphasic subspecies.

Other studies of the Enterobacteriaceae have used DNA se-quence-based approaches to dissect the natural history of theseorganisms. Dauga and Fukushima et al. (6, 9) each comparedthe DNA gyrase subunit gene gyrB to 16S rRNA sequences andfound that the congruence of phylogenetic relationships is notalways clear within the Enterobacteriaceae. Along these samelines, Roggenkamp (29) compared oriC to 16S rRNA andfound that oriC gave robust phylogenies for species withinEnterobacteriaceae. Paradis et al. (22) compared tuf and atpDto 16S phylogenies within Enterobacteriaceae and found thatthese phylogenies were comparable and gave better discrimi-nation than 16S alone.

The goal of this study was to clearly define the species andsubspecies phylogeny of Salmonella based on DNA sequenceanalysis. Inconsistent topologies may have been attributed to asmall representative number of taxa from each subspecies,which may have generated inconsistencies between these dif-ferent studies. To decrease the possibility of incorrect phylog-enies with this study, we increased the number of Salmonellataxa studied, ranging from 16 to 20 isolates to 69 with thesedata; we were able to resolve the phylogenetic relationshipsamong the subspecies with respect to the division of themonophasic and diphasic subspecies. We selected four genesfor analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae,and analysis by previous studies. We present here a robustphylogeny of the salmonellae by analysis of these four genes(3,459 bp for each isolate) from 67 isolates of common sero-types and two published genomes. These new data provide amore comprehensive representation of the species and subspe-cies of Salmonella. In addition, we also determined the geneticdistances between the subspecies of Salmonella and found newevidence of the lateral transfer of the recA gene between twosubspecies. This Salmonella phylogeny will create a template or“backbone” on which to overlay questions of gene and genomeevolution in Salmonella.

MATERIALS AND METHODS

Isolates. All isolates were selected from the reference collections of the Na-tional Salmonella Reference Laboratory (CDC, Atlanta, GA) or the CentroNacional de Microbiologia (CNM) (Madrid, Spain) and are listed in Table S1 inthe supplemental material. These isolates were obtained originally from humanclinical specimens submitted through various state health departments in theUnited States (CDC) or from the 14 or 17 autonomous communities in Spain(CNM). Isolates were selected based on the frequency of the serotype in NorthAmerica and Europe, as well as clinically important serotypes and phenotypicvariants of common serotypes. Within each subspecies, at least one representa-tive of the most commonly identified serotypes was selected. Salmonella Refer-ence Collection (SARC) isolates were obtained through the CNM.

PCR amplification and sequencing. All genomic DNA was prepared with theDNeasy kit (Qiagen, Valencia, CA). Amplification and sequencing primers used

FIG. 1. Summary of previous phylogenetic studies. (a) Tree basedon MLEE data from Boyd et al. (2). (b) Tree based on housekeepinggene sequence from Boyd et al. (2). (c) Tree based on gene acquisitiondata from microarray analysis from Porwollik et al. (26). (d) Treebased on invasion gene sequence from Boyd et al. (2).

VOL. 190, 2008 MOLECULAR PHYLOGENY OF THE SALMONELLAE 7061

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

in this study are listed in Table S2 in the supplemental material; mdh primers arefrom Boyd et al. (2). A total of 100 ng of purified genomic DNA was used in aPCR analysis to amplify products from the genes. For the PCR analysis, theReady-To-Go PCR beads were used (GE Biosciences, Piscataway NJ) accordingto product specifications. This included one Ready-To-Go PCR bead, 1 �l ofpooled forward and reverse primers for respective genes at a 0.5 �M concen-tration each, and 1 �l of genomic DNA at 100 ng/�l. PCR conditions for theindividual gene amplification are as follows: recA, 96°C for 2 min, followed by 35cycles of 96°C for 30 s, 53°C for 30 s, and 72°C for 2 min; mdh, 96°C for 2 min,followed by 35 cycles of 96°C for 30 s, 54°C for 30 s, and 72°C for 1 min; phoP,96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 53°C for 20 s, and 72°C for1 min; and gapA, 96°C for 2 min, followed by 35 cycles of 96°C for 30 s, 55°C for30 s, and 72°C for 1 min. PCR products were purified for sequencing using theQIAquick PCR cleanup kit (Qiagen, Valencia, CA).

Sequencing. All sequencing was performed directly on PCR products usingeither the CEQ 8000 genetic analysis systems (Beckman Corp., Fullerton, CA) orthe ABI 3700 genetic analyzer (Applied Biosystems, Foster City, CA). All re-spective methods and reagents for these systems were followed.

Analysis of DNA sequences. DNA sequences were confirmed bidirectionallywith fourfold coverage. These were aligned and edited with Lasergene 5.0(DNAStar, Madison WI), and sequence alignments were performed withClustalX (35), subspecies distance matrices were calculated by MEGA 3.1 (15)and exported into Microsoft Excel, the synonymous substitution distances weredetermined for the concatenated tree with the four housekeeping genes usingMEGA 3.1, and these substitution distances were used to generate a neighbor-joining tree using MEGA 3.1.

Sequence alignments and branch swapping topology tests were performedusing MacClade (17). Phylogenetic analysis was performed using MEGA (15)and MrBayes (11), and resulting tree files were viewed and edited in Treeview(21). Polymorphism, substitution, and G�C content calculations were generatedin DnaSP (see Table S3 in the supplemental material) (30).

LGT. We assessed the possibility of lateral gene transfer (LGT) events inthe individual trees of recA and mdh. Phylogenetic trees were generated fromthese data using the maximum likelihood and distance methods for both thenucleotide and amino acid sequences. Topology tests were imposed on fourtrees for each gene using Tree-Puzzle (31) as follows: (i) the consensus tree,(ii) the best tree generated by Tree-Puzzle, (iii) the tree generated by ex-changing the hypothetical LGT branch to the node represented in the fourconcatenated gene tree (swap tree), and (iv) a negative control tree generatedby relocating the Salmonella bongori branch to the subsp. I node (unlikely tooccur). Comparisons for the statistical analysis are against the best tree. Theexpectation, if LGT has occurred, is that the swap tree would fail all tests, aswould the negative control. The best and consensus trees act as positivecontrols and should pass all tests.

The following tests were run on these trees and statistically evaluated: (i) aone-sided Kishino-Hasegawa test based on pairwise Shimodaira-Hasegawa tests(10, 14), (ii) a Shimodaira-Hasegawa test (32), (iii) an expected likelihood weighttest (34), and (iv) a two-sided Kishino-Hasegawa test (14).

Mean distance matrices for each subspecies were calculated in MEGA 3.1using the following model: codon:modified Nei-Gojobori (Jukes-Cantor); tran-sition/transversion ratio � 2, uniform rates, and a no. of sites of 1,150. Thesewere used to generate an average topology for the subspecies (see Fig. 4).

Rate calculations. Synonymous substitution rates calculated for the concate-nated sequences were performed in MEGA 3.1 (15) and compared to thesubstitution rates published by Berg and Martelius (1). These rates were used tocalculate a range of dates of divergence (see Fig. 5). Calculation of the dates wasbased on a slower synonymous substitution rate of 6.32 � 10�10 calculated withthese data and the higher rate of 3.0 � 10�9 used by Berg and Martelius (1),therefore creating a range of values.

Genome sequences used in this study include Salmonella enterica serotypeTyphimurium LT2 NC_003197, Salmonella enterica serotype Typhi NC_003198,Escherichia coli K12 NC_000913, and Shigella flexneri 2a str.301 NC_004337.

Nucleotide sequence accession numbers. GenBank accession numbers of de-posited sequences are as follows: recA, DQ644868 to DQ644934; mdh,DQ644734 to DQ644800; phoP, DQ644801 to DQ644867; and gapA, DQ644634to DQ644700.

RESULTS

Sequences. A DNA sequence data set was completed for thehousekeeping genes recA, mdh, phoP, and gapA of 67 isolatesof common serotypes of Salmonella corresponding to 924 bp of

gapA, 837 bp of mdh, 639 bp of phoP, and the complete codingsequence of recA (1,059 bp) (Table 1). The DNA sequences ofthese genes were also acquired from published genomes ofSalmonella enterica serotype Typhimurium LT2, Salmonellaenterica serotype Typhi CT18, E. coli K-12, and Shigella flexneriand were included in this study.

The phylogenies of the four housekeeping genes were in-ferred individually using both nucleotide and predicted aminoacid translation. In general, the predicted amino acid se-quences did not contain enough sequence variation to producewell-supported trees; therefore, all subsequent analyses werecompleted using nucleotide data. Analyses performed on indi-vidual genes (Fig. 2a to d) or genes concatenated in pairs didnot provide consistent resolution and support for these trees.Concatenated sequences from all four genes, comprising atotal sequence length of 3,459 bp for each taxon, resulted in awell-supported tree by both Bayesian and neighbor-joiningmethods, with the exception of one node dividing subsp. IIIbfrom the rest of the diphasic subspecies (Fig. 3).

Comparison of the topology of the trees based on individualgene sequences indicated that there may be inconsistencies inboth recA and mdh phylogenies compared to the concatenatedsequences. The tree for recA indicated a possible LGT fromsubsp. IIIb to subsp. IIIa (Fig. 2a). There also appeared to bea possible lateral transfer of mdh from subsp. II to subsp. IVand VII (Fig. 2b). We assessed these anomalies using fourtopology tests in a maximum likelihood analysis. Topologytesting of the four trees for recA (consensus, best, swap, andcontrol) illustrated that the swap tree containing the relocatedbranch failed three out of four tests, indicating that this was alikely case of LGT (Fig. 4). The negative control tree alsofailed all tests, whereas the best and consensus trees passed allfour tests. The horizontal transfer event was statistically sup-ported using these methods at the P � 0.05 level in all threetests. These results suggest that the recA gene in subsp. IIIawas laterally transferred from subsp. IIIb.

The individual tree for mdh demonstrated clustering of subsp.IV and VII with subsp. II; however, this relationship was unsup-ported (Fig. 2b). This indicated a possible transfer of mdh be-tween these subspecies. The four topology tests were performedon the individual mdh tree by swapping the subsp. IV and IIbranches to their predicted locations based on the concatenatedtree. This analysis, performed on the mdh individual tree, failed toreject the null hypothesis of no LGT event (Fig. 4).

TABLE 1. Description of the four genes analyzedin this studya

Gene Location(kb)

CDSlength(bp)

Sequence(bp) Gene product description

recA 2974 1,062 1,059 DNA recombinasemdh 3526 939 837 Malate dehydrogenasephoP 1318 675 639 Mg� response regulatorgapA 1368 996 924 Glyceraldehyde-3-phosphate

dehydrogenase

a Location refers to gene location on the Salmonella enterica serotype Typhi-murium LT2 genome sequence (accession no. NC_003197). CDS length is thecoding sequence length from NC_003197. Sequence is the length of sequence inbase pairs (bp) used in this study. Gene product description is listed in the S.enterica serotype Typhimurium LT2 genome sequence (NC_003197).

7062 MCQUISTON ET AL. J. BACTERIOL.

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

The recognition of an LGT event of recA implies that anancestral lineage of the recA gene from subsp. IIIa existed andwas displaced by an LGT event. In an attempt to locate anancestral vertically acquired lineage of recA, we identified 21

other subsp. IIIa strains with unusual biochemical or MLEEproperties showing similarities to subsp. IV or Salmonella bon-gori that had been described either by the National SalmonellaReference Laboratory at the CDC or by Reeves et al. (27). We

FIG. 2. Phylogeny based on individual gene sequences. Bayesian consensus of 900 trees from the individual nucleotide sequences. Supportvalues of all major nodes are listed; internal branches with values higher than 0.75 are reported. Bars, 0.1 substitutions per site.

VOL. 190, 2008 MOLECULAR PHYLOGENY OF THE SALMONELLAE 7063

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

found that all 21 subsp. IIIa isolates had obtained the laterallytransferred recA sequence from the subsp. IIIb lineage (un-published observations).

Because the recA gene was transferred from subsp. IIIb tosubsp. IIIa, the recA sequence data from all serotypes in subsp.IIIa were removed from the concatenated analysis, and theresulting tree was reestimated (Fig. 3). With the removal of therecA sequence from the concatenated sequences of subsp. IIIa,both neighbor-joining and Bayesian methods generated a con-sensus tree with strong support at the major nodes for eachsubspecies. This consensus tree also demonstrates that for allserotypes tested within each subspecies, each resides in theclade with the other members of the same subspecies. This treesupports Salmonella bongori as the most ancestral lineage ofthe Salmonella when rooted with E. coli and Shigella flexnerisequences. Subspecies IIIa was the earliest diverging lineage of

the Salmonella enterica species, followed by subsp. IV. Subspe-cies VII was found in all analyses as a sister group, close tosubsp. IV, as previously described by Boyd et al. (2).

This analysis also resolved the four diphasic subspecies, IIIb,II, VI, and I, as being monophyletic and separate from thethree monophasic subspecies, IIIa, IV, and VII, with Salmo-nella bongori representing the earliest diverging lineage of theSalmonella.

Evolutionary dates. Given the importance of the relation-ship of the subspecies to specific niches, we attempted to relatethe major steps in the evolution of Salmonella to identifiabletime periods in geologic history. Each major subspecies nodein the concatenated tree topology was labeled with the approx-imate date of divergence based on the substitution rate fromBerg and Martelius (1) as well as the rate generated in thisstudy of 6.32 � 10�10 (Fig. 5). Both rates were set to a refer-

FIG. 3. Phylogeny based on concatenated gene trees. Bayesian consensus of 900 trees representing the phylogeny of Salmonella based on thefour housekeeping genes, recA, mdh, phoP, and gapA. The node support value represented by an asterisk denotes the weakly supported node priorto removal of the subspecies IIIa recA sequence.

7064 MCQUISTON ET AL. J. BACTERIOL.

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

ence point of the divergence of Salmonella and Escherichia colifrom their common ancestor at the accepted date of 100 mil-lion years ago (MYA) (7, 19). These estimates predict that thedivergence of Salmonella enterica from Salmonella bongori wasbetween 40.0 and 63.4 MYA (during the Eocene period);subsp. IIIa diverged between 21.5 and 34.0 MYA, and subsp.IV diverged from the diphasic subspecies between 14.2 and22.4 MYA (Fig. 5) during the Miocene epoch.

DISCUSSION

We illustrate here a robust phylogeny of Salmonella speciesand subspecies using the sequences of four housekeepinggenes, recA, mdh, phoP, and gapA, from 67 isolates included inthis study and four published genome sequences. Analyses ofthese sequences gave a clear topology for the phylogenetichistory of Salmonella. This study also demonstrates that alllineages within a subspecies, regardless of their serotype, clus-ter together, suggesting that serotypes within a subspeciesevolve together as one taxon.

Many previous studies have tried to clarify the complexphylogeny of the species of Salmonella. The phylogeny definedin our study was in close agreement with the tree based on theinvasion genes in Boyd et al. (2), with the exception of therelationships among diphasic subsp. I, II, VI, and IIIb (Fig. 1c).The housekeeping gene tree of Boyd et al. had one branchdifference where subsp. II and IIIb formed a clade in theprevious study. This study confirms the clear separation of themonophasic and diphasic Salmonella subspecies and resolvesconflicts among previous studies.

The specific phenotypic change from the monophasic todiphasic state is the result of the acquisition of the hin andfljBA flagellin operon (33). The MLEE data from Boyd et al.(2) result in a tree that places subspecies I, VI, IIIa, and IIIbtogether, separate from II, IV, and VII. If correct, this wouldsuggest that the diphasic Salmonella subspecies would have

FIG. 4. (a) Representation of topology tests based on Bayesian consensus and best trees for recA and mdh. Swap refers to branch relocationto concatenated tree location for topology test. Control indicates branch relocation of S. bongori to subsp. I branch. (b) Statistical analyses ofBayesian trees testing the hypothesis of a recA and mdh LGT event. Cons, consensus tree; best, best tree; swap, tree with relocation of branch basedon concatenated tree; control, tree relocation of Salmonella bongori branch to subsp. I.

FIG. 5. Neighbor-joining tree generated from the average subspe-cies distance matrix. Dates labeled on each node are represented inmillions of years ago.

VOL. 190, 2008 MOLECULAR PHYLOGENY OF THE SALMONELLAE 7065

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

evolved either twice or earlier than reported otherwise. Thestudy by Porwollik et al. (25), based on the presence or absenceof genes using microarray methods, presents a tree wheresubsp. IIIa is in close proximity (a sister group) to the diphasicsubspecies, as opposed to subsp. IIIa, which represents a basallineage of Salmonella enterica among the monophasic subspe-cies. This finding may be reflective of our finding of a transferevent of recA from subsp. IIIb to subsp. IIIa and perhapsadditional genes from that region or other parts of the genomehave been transferred between these subspecies.

In our first analysis of the concatenated gene sequences, thenode of the tree separating subsp. IIIb and other diphasicsubspecies was consistently unsupported, which suggested thatone or more of the genes might have inconsistencies in itsevolutionary history. We determined the topology using Bayes-ian and neighbor-joining methods for the individual trees, us-ing the nucleotide sequences and possible combinations of thegenes to ascertain potential horizontal transfer events. Thetree topology for both individual trees for recA and mdh wereinconsistent with the concatenated trees. To conclude if hori-zontal exchanges had occurred, we used branch-swapping to-pology tests (Fig. 4). recA was determined to have transferredfrom subsp. IIIb to subsp. IIIa, while mdh was not supported asa recent transfer. Identifying a probable lateral transfer eventof the recA gene from subsp. IIIb to subsp. IIIa demonstratesthat these events may happen within Salmonella, and careshould be taken when inferring any phylogeny based on alimited number of genes.

The resulting consensus tree (Fig. 3) gives a comprehensivepicture of the phylogeny of the Salmonella subspecies. Thedates we infer (Fig. 5) also prompt new questions as to therelationships of those subspecies to the new niches that mayhave opened during that time period. An example of this is theissue of Salmonella’s acquisition of the second flagellin operonat the divergence point of subsp. IIIb from subsp. IV. Accord-ing to the evolutionary dates calculated by this study, the ac-quisition of the second flagellin operon occurred during theMiocene epoch, which is characterized by the rapid expansionof the grasslands and hoofed mammals (20). The origin of thisevolutionary novelty may have opened a series of new nichesfor Salmonella.

The isolates in this study were selected based on frequencyof human infections as well as an attempt to balance represen-tation between subspecies and the two species of Salmonella.Other studies have used the SARCs A, B, and C. Our studyalso chose to examine if the phylogenetic information realizedfrom the SARCs would hold true with a more expanded col-lection. A limitation of this study is that all of these isolateswere human clinical isolates submitted to the Centers for Dis-ease Control and Prevention and Centro Nacional Microbio-logica, Spain, and limited to these geographic locations. Fur-ther study of a globally representative set of serotypes may leadto interesting further conclusions. Serotypes that are rarelyassociated with human infection are underrepresented andmay limit illustration of the true history of Salmonella. It ispossible that the subsp. IIIb transfer of recA to subsp. IIIa mayhave occurred in only one lineage and survived among strainsthat more frequently infect humans. We subsequently exam-ined the recA allele from 21 other subsp. IIIa isolates withatypical biochemical properties or MLEE patterns and from

nonhuman sources. This pursuit for an ancestral recA forsubsp. IIIa was not successful, as all 21 isolates contained thesubsp. IIIb-originated allele. It remains to be seen if an allelefrom subsp. IIIa containing its ancestral recA gene survived tothe present time, though we cannot exclude the possibility thatsuch a lineage persists undetected in remote niches and neverappears in humans, even incidentally.

This study demonstrates the phylogeny of Salmonella speciesand subspecies using the sequences of four housekeepinggenes. After correcting for a single previously unsuspectedLGT event, a strong consensus phylogenetic tree emerged.This tree resolves the previously observed variant trees gener-ated with smaller strain collections and other methods. Thisconsensus phylogeny indicates that the species and subspeciesof Salmonella are evolving as separate lineages and that theintrasubspecies similarities are due to their common ancestry.Intragenic transfer and LGT within subspecies may play a rolein the isolation and divergence of the subspecies as describedby Brown et al. (3); however, recombination within Salmonellasubspecies was not assessed in this study.

The information presented here will serve as a template ofthe salmonellae for further studies investigating when othermajor evolutionary events occurred in the history of Salmo-nella.

ACKNOWLEDGMENTS

We thank Jennifer McQuiston, Linda Demma, and James Thomasfor their editorial comments in preparation of the manuscript.

REFERENCES

1. Berg, O. G., and M. Martelius. 1995. Synonymous substitution-rate constantsin Escherichia coli and Salmonella typhimurium and their relationship to geneexpression and selection pressure. J. Mol. Evol. 41:449–456.

2. Boyd, E. F., F. S. Wang, T. S. Whittam, and R. K. Selander. 1996. Moleculargenetic relationships of the salmonellae. Appl. Environ. Microbiol. 62:804–808.

3. Brown, E. W., M. K. Mammel, J. E. LeClerc, and T. A. Cebula. 2003. Limitedboundaries for extensive horizontal gene transfer among Salmonella patho-gens. Proc. Natl. Acad. Sci. USA 100:15676–15681.

4. Centers for Disease Control and Prevention. 2005. Salmonella: annual sum-mary, 2004. U.S. Department of Health and Human Services, Atlanta, GA.

5. Crosa, J. H., D. J. Brenner, W. H. Ewing, and S. Falkow. 1973. Molecularrelationships among the salmonelleae. J. Bacteriol. 115:307–315.

6. Dauga, C. 2002. Evolution of the gyrB gene and the molecular phylogeny ofEnterobacteriaceae: a model molecule for molecular systematic studies. Int. J.Syst. Evol. Microbiol. 52:531–547.

7. Doolittle, R. F., D. F. Feng, S. Tsang, G. Cho, and E. Little. 1996. Deter-mining divergence times of the major kingdoms of living organisms with aprotein clock. Science 271:470–477.

8. Ewing, W. H. 1986. Edwards and Ewing’s identification of Enterobacteria-ceae, 4th ed. Burgess Publishing Co., New York, NY.

9. Fukushima, M., K. Kakinuma, and R. Kawaguchi. 2002. Phylogenetic anal-ysis of Salmonella, Shigella, and Escherichia coli strains on the basis of thegyrB gene sequence. J. Clin. Microbiol. 40:2779–2785.

10. Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-basedtests of topologies in phylogenetics. Syst. Biol. 49:652–670.

11. Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inferenceof phylogenetic trees. Bioinformatics 17:754–755.

12. Judicial Commission of the International Committee on Systematics ofProkaryotes. 2005. The type species of the genus Salmonella Lignieres 1900is Salmonella enterica (ex Kauffmann and Edwards 1952) Le Minor andPopoff 1987, with the type strain LT2T, and conservation of the epithetenterica in Salmonella enterica over all earlier epithets that may be appliedto this species. Opinion 80. Int. J. Syst. Evol. Microbiol. 55:519–520.

13. Kauffmann, F. 1950. The diagnosis of Salmonella types. Charles C. Thomas,Springfield, IL.

14. Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum-likelihoodestimate of the evolutionary tree topologies from DNA-sequence data, andthe branching order in hominoidea. J. Mol. Evol. 29:170–179.

15. Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: integrated software formolecular evolutionary genetics analysis and sequence alignment. BriefBioinform. 5:150–163.

7066 MCQUISTON ET AL. J. BACTERIOL.

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Molecular Phylogeny of the Salmonellae: Relationships ...for analysis, recA, mdh, phoP, and gapA, based on representa-tion across the genome, conservation across the salmonellae, and

16. Le Minor, L., and M. Y. Popoff. 1987. Request for an opinion. Designationof Salmonella enterica sp. nov., nom. rev., as the type and only species of thegenus Salmonella. Int. J. Syst. Bacteriol. 37:465–468.

17. Maddison, D. R., and W. P. Maddison. 2006. MacClade 4: analysis of phy-logeny and character evolution, version 4.08. Sinauer Associates, Sunder-land, MA.

18. Nataro, J. P., C. A. Bopp, P. I. Fields, J. B. Kaper, and N. A. Strockbine.2007. Escherichia, Shigella, and Salmonella, p. 671–687. In P. R. Murray (ed.),Manual of clinical microbiology, 9th ed., vol. 1. ASM Press, Washington, DC.

19. Ochman, H., and A. C. Wilson. 1987. Evolution in bacteria: evidence for auniversal substitution rate in cellular genomes. J. Mol. Evol. 26:74–86.

20. Ogg, J. 2004. Overview of global boundary stratotype sections and points(GSSP’s). http://www.stratigraphy.org/gssp.htm/.

21. Page, R. D. 1996. TreeView: an application to display phylogenetic trees onpersonal computers. Comput. Appl. Biosci. 12:357–358.

22. Paradis, S., M. Boissinot, N. Paquette, S. D. Belanger, E. A. Martel, D. K.Boudreau, F. J. Picard, M. Ouellette, P. H. Roy, and M. G. Bergeron. 2005.Phylogeny of the Enterobacteriaceae based on genes encoding elongationfactor Tu and F-ATPase �-subunit. Int. J. Syst. Evol. Microbiol. 55:2013–2025.

23. Popoff, M. Y., J. Bockemuhl, and L. L. Gheesling. 2003. Supplement 2001(no. 45) to the Kauffmann-White scheme. Res. Microbiol. 154:173–174.

24. Popoff, M. Y., J. Bockemuhl, and L. L. Gheesling. 2004. Supplement 2002(no. 46) to the Kauffmann-White scheme. Res. Microbiol. 155:568–570.

25. Porwollik, S., and M. McClelland. 2003. Lateral gene transfer in Salmonella.Microbes Infect. 5:977–989.

26. Porwollik, S., R. M. Wong, and M. McClelland. 2002. Evolutionary genomicsof Salmonella: gene acquisitions revealed by microarray analysis. Proc. Natl.Acad. Sci. USA 99:8956–8961.

27. Reeves, M. W., G. M. Evins, A. A. Heiba, B. D. Plikaytis, and J. J. Farmer III.1989. Clonal nature of Salmonella typhi and its genetic relatedness to other

salmonellae as shown by multilocus enzyme electrophoresis, and proposal ofSalmonella bongori comb. nov. J. Clin. Microbiol. 27:313–320.

28. Reeves, P., and G. Stevenson. 1989. Cloning and nucleotide sequence of theSalmonella typhimurium LT2 gnd gene and its homology with the corre-sponding sequence of Escherichia coli K12. Mol. Gen. Genet. 217:182–184.

29. Roggenkamp, A. 2007. Phylogenetic analysis of enteric species of the familyEnterobacteriaceae using the oriC-locus. Syst. Appl. Microbiol. 30:180–188.

30. Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program formolecular population genetics and molecular evolution analysis. Bioinfor-matics 15:174–175.

31. Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002.TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartetsand parallel computing. Bioinformatics 18:502–504.

32. Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol.16:1114–1116.

33. Silverman, M., J. Zieg, M. Hilmen, and M. Simon. 1979. Phase variation inSalmonella: genetic analysis of a recombinational switch. Proc. Natl. Acad.Sci. USA 76:391–395.

34. Strimmer, K., and A. Rambaut. 2002. Inferring confidence sets of possiblymisspecified gene trees. Proc. R. Soc. Lond. Ser. B 269:137–142.

35. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment throughsequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res. 22:4673–4680.

36. Tindall, B. J., P. A. Grimont, G. M. Garrity, and J. P. Euzeby. 2005. No-menclature and taxonomy of the genus Salmonella. Int. J. Syst. Evol. Micro-biol. 55:521–524.

37. Voetsch, A. C., T. J. Van Gilder, F. J. Angulo, M. M. Farley, S. Shallow, R.Marcus, P. R. Cieslak, V. C. Deneen, and R. V. Tauxe. 2004. FoodNetestimate of the burden of illness caused by nontyphoidal Salmonella infec-tions in the United States. Clin. Infect. Dis. 38(Suppl. 3):S127–S134.

VOL. 190, 2008 MOLECULAR PHYLOGENY OF THE SALMONELLAE 7067

on March 24, 2020 by guest

http://jb.asm.org/

Dow

nloaded from