Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns...

6
Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam 1 , David Williams 1 , and J. Peter Gogarten 2 Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125 Edited by Dieter Söll, Yale University, New Haven, CT, and approved April 23, 2010 (received for review February 5, 2010) In phylogenetic reconstruction, two types of bacterial tyrosyl-tRNA synthetases (TyrRS) form distinct clades with many bacterial phyla represented in both clades. Very few taxa possess both forms, and maximum likelihood analysis of the distribution of TyrRS types suggests horizontal gene transfer (HGT), rather than an ancient duplication followed by differential gene loss, as the contributor to the evolutionary history of TyrRS in bacteria. However, for each TyrRS type, phylogenetic reconstruction yields phylogenies similar to the ribosomal phylogeny, revealing that frequent gene transfer has not destroyed the expected phylogeny; rather, the expected phylogenetic signal was reinforced or even created by HGT. We show that biased HGT can mimic patterns created through shared ancestry by in silico simulation. Furthermore, in cases where genomic synteny is sufcient to allow comparisons of relative gene positions, both tyrRS types occupy equivalent positions in closely related genomes, rejecting the loss hypothesis. Although the two types of bacterial TyrRS are only distantly related and only rarely coexist in a single genome, they have many features in common with alleles that are swapped between related lineages. We pro- pose to label these functionally similar homologs as homeoalleles. We conclude that the observed phylogenetic pattern reects both vertical inheritance and biased HGT and that the signal caused by common organismal descent is difcult to distinguish from the sig- nal due to biased gene transfer. aminoacyl tRNA synthetase | bacteria | evolution | horizontal gene transfer | lateral gene transfer A s a consequence of horizontal gene transfer (HGT), or- ganisms harbor genes that have undergone different evolu- tionary routes, and therefore, the history of a gene can signi- cantly differ from the history of the organism itself (1, 2). The transferred genes may be novel to the recipient, or they may be homologous to genes already existing in the recipient. In the latter case, the transferred gene can be added to the recipient genome or replaced through recombination or differential loss after a temporary period of coexistence. There is controversy about how much impact HGT has on our ability to reconstruct the evolutionary history of organisms. Some believe that, although HGT does occur, it may be of small con- sequence compared with other mechanisms of evolution and that other possibilities exist to explain the observed discrepancies in tree topologies (3). Others have pointed out that HGT itself can create phylogenetically informative characters (4). HGT can result in dramatic discordance [e.g., the Prochlo- rococcus threonyl-tRNA synthetase (TyrRS) groups with γ- proteobacterial homologs (5) and the TyrRS from opisthoconts groups with Haloarcula (6)]. Such transfers between divergent species* (7) may be draped over a phylogenetic backbone of vertical descent (8, 9) and do not necessarily affect the resolution attainable for a tree-like vertical inheritance-based process of organismal evolution. In studying genome evolution, conicting genes often are ex- cluded from the analysis (10, 11), or a consensus signal or central tendency (12, 13) is extracted from multiple gene trees. The resulting tree often is considered a reection of organismal phy- logeny. However, in the case of the extremely thermophilic bac- teria, it has become obvious that an averaging approach is problematic. Judged by the majority of their genes, the Aquicales group with the ε-proteobacteria (14) and the Thermotogales group among the Clostridia (15), it is only the ribosomal components (RNA and proteins) that group the Thermotogales and Aquicales together at the base of the Bacteria. Because ribosomes are not known or expected to be transferred between divergent species (16), the close and often deep branching of Aquicales and Thermotogales frequently is considered to reect organismal his- tory (14, 15, 17). If this is true, then the majority of the genes and their phylogenetic signal reect highways of gene sharing (18). A more fundamental challenge to reconstructing a genome- based tree of life is the suggestion that gene transfer itself may create a signal that mimics the relationships created through ver- tical inheritance. In its extreme formulation, this hypothesis states that we consider organisms as related because they frequently exchange genes and not because the more similar genomes reect a more recent organismal ancestry (2, 19). The analyses of the extremely thermophilic bacterial genomes support the notion that, over time, HGT can erode the vertical signal; however, the Aquicales and Thermotogales might be considered extreme cases, and even in their case, careful analysis may still allow the reconstruction of the reticulated phylogeny of these groups con- sisting of a ribosomal backbone and several highways of gene sharing (14, 15, 17). At present, the extent to which phylogenetic patterns are caused by biased gene transfer or recent shared an- cestry remains unresolved. In many instances, a signal because of HGT will likely reinforce the similarity caused by shared ancestry. In this study, we reconstructed the phylogeny of TyrRS homologs from a broad sampling of taxa in the bacterial domain and found a topology with two deep clades that contrasted markedly to that of the 16S rRNA phylogeny of the same or- ganisms. Our analyses support vertical inheritance modulated by HGT between closely related organisms as a likely hypothesis to explain the unusual pattern in the TyrRS phylogeny. Our nd- ings negate the notion that HGT events are indiscriminate and tend to obscure prokaryotic organismal phylogeny; rather, the observed phylogenetic central tendency observed in genome analyses seems to be caused by two processes: shared ancestry and biased HGT. Author contributions: C.P.A., D.W., and J.P.G. designed research; C.P.A. and D.W. per- formed research; C.P.A., D.W., and J.P.G. contributed new reagents/analytic tools; C.P.A., D.W., and J.P.G. analyzed data; and C.P.A., D.W., and J.P.G. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1 C.P.A. and D.W. contributed equally to this work. 2 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1001418107/-/DCSupplemental. *We use the term species to denote a group of similar organisms without implying a particular or uniform species concept. Similarly, we use the term lineage or line of descent to denote the vertical component in the ancestry of this group. www.pnas.org/cgi/doi/10.1073/pnas.1001418107 PNAS | June 8, 2010 | vol. 107 | no. 23 | 1067910684 MICROBIOLOGY Downloaded by guest on November 16, 2020

Transcript of Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns...

Page 1: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

Biased gene transfer mimics patterns created throughshared ancestryCheryl P. Andam1, David Williams1, and J. Peter Gogarten2

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125

Edited by Dieter Söll, Yale University, New Haven, CT, and approved April 23, 2010 (received for review February 5, 2010)

In phylogenetic reconstruction, two types of bacterial tyrosyl-tRNAsynthetases (TyrRS) form distinct clades with many bacterial phylarepresented in both clades. Very few taxa possess both forms, andmaximum likelihood analysis of the distribution of TyrRS typessuggests horizontal gene transfer (HGT), rather than an ancientduplication followed by differential gene loss, as the contributor tothe evolutionary history of TyrRS in bacteria. However, for eachTyrRS type, phylogenetic reconstruction yields phylogenies similarto the ribosomal phylogeny, revealing that frequent gene transferhas not destroyed the expected phylogeny; rather, the expectedphylogenetic signal was reinforced or even created by HGT. Weshow that biased HGT can mimic patterns created through sharedancestry by in silico simulation. Furthermore, in cases wheregenomic synteny is sufficient to allow comparisons of relative genepositions, both tyrRS types occupy equivalent positions in closelyrelated genomes, rejecting the loss hypothesis. Although the twotypes of bacterial TyrRS are only distantly related and only rarelycoexist in a single genome, they have many features in commonwith alleles that are swapped between related lineages. We pro-pose to label these functionally similar homologs as homeoalleles.We conclude that the observed phylogenetic pattern reflects bothvertical inheritance and biased HGT and that the signal caused bycommon organismal descent is difficult to distinguish from the sig-nal due to biased gene transfer.

aminoacyl tRNA synthetase | bacteria | evolution | horizontal genetransfer | lateral gene transfer

As a consequence of horizontal gene transfer (HGT), or-ganisms harbor genes that have undergone different evolu-

tionary routes, and therefore, the history of a gene can signifi-cantly differ from the history of the organism itself (1, 2). Thetransferred genes may be novel to the recipient, or they may behomologous to genes already existing in the recipient. In thelatter case, the transferred gene can be added to the recipientgenome or replaced through recombination or differential lossafter a temporary period of coexistence.There is controversy about how much impact HGT has on our

ability to reconstruct the evolutionary history of organisms. Somebelieve that, although HGT does occur, it may be of small con-sequence compared with other mechanisms of evolution and thatother possibilities exist to explain the observed discrepancies intree topologies (3). Others have pointed out that HGT itself cancreate phylogenetically informative characters (4).HGT can result in dramatic discordance [e.g., the Prochlo-

rococcus threonyl-tRNA synthetase (TyrRS) groups with γ-proteobacterial homologs (5) and the TyrRS from opisthocontsgroups with Haloarcula (6)]. Such transfers between divergentspecies* (7) may be draped over a phylogenetic backbone ofvertical descent (8, 9) and do not necessarily affect the resolutionattainable for a tree-like vertical inheritance-based process oforganismal evolution.In studying genome evolution, conflicting genes often are ex-

cluded from the analysis (10, 11), or a consensus signal or centraltendency (12, 13) is extracted from multiple gene trees. Theresulting tree often is considered a reflection of organismal phy-logeny. However, in the case of the extremely thermophilic bac-

teria, it has become obvious that an averaging approach isproblematic. Judged by the majority of their genes, the Aquificalesgroupwith the ε-proteobacteria (14) and the Thermotogales groupamong the Clostridia (15), it is only the ribosomal components(RNAand proteins) that group theThermotogales andAquificalestogether at the base of the Bacteria. Because ribosomes are notknown or expected to be transferred between divergent species(16), the close and often deep branching of Aquificales andThermotogales frequently is considered to reflect organismal his-tory (14, 15, 17). If this is true, then the majority of the genes andtheir phylogenetic signal reflect highways of gene sharing (18).A more fundamental challenge to reconstructing a genome-

based tree of life is the suggestion that gene transfer itself maycreate a signal that mimics the relationships created through ver-tical inheritance. In its extreme formulation, this hypothesis statesthat we consider organisms as related because they frequentlyexchange genes and not because the more similar genomes reflecta more recent organismal ancestry (2, 19). The analyses of theextremely thermophilic bacterial genomes support the notionthat, over time, HGT can erode the vertical signal; however, theAquificales and Thermotogales might be considered extremecases, and even in their case, careful analysis may still allow thereconstruction of the reticulated phylogeny of these groups con-sisting of a ribosomal backbone and several highways of genesharing (14, 15, 17). At present, the extent to which phylogeneticpatterns are caused by biased gene transfer or recent shared an-cestry remains unresolved. In many instances, a signal because ofHGT will likely reinforce the similarity caused by shared ancestry.In this study, we reconstructed the phylogeny of TyrRS

homologs from a broad sampling of taxa in the bacterial domainand found a topology with two deep clades that contrastedmarkedly to that of the 16S rRNA phylogeny of the same or-ganisms. Our analyses support vertical inheritance modulated byHGT between closely related organisms as a likely hypothesis toexplain the unusual pattern in the TyrRS phylogeny. Our find-ings negate the notion that HGT events are indiscriminate andtend to obscure prokaryotic organismal phylogeny; rather, theobserved phylogenetic central tendency observed in genomeanalyses seems to be caused by two processes: shared ancestryand biased HGT.

Author contributions: C.P.A., D.W., and J.P.G. designed research; C.P.A. and D.W. per-formed research; C.P.A., D.W., and J.P.G. contributed new reagents/analytic tools;C.P.A., D.W., and J.P.G. analyzed data; and C.P.A., D.W., and J.P.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.1C.P.A. and D.W. contributed equally to this work.2To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1001418107/-/DCSupplemental.

*We use the term species to denote a group of similar organisms without implyinga particular or uniform species concept. Similarly, we use the term lineage or line ofdescent to denote the vertical component in the ancestry of this group.

www.pnas.org/cgi/doi/10.1073/pnas.1001418107 PNAS | June 8, 2010 | vol. 107 | no. 23 | 10679–10684

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0

Page 2: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

ResultsPhylogeny of Bacterial TyrRS. TyrRSs are aminoacyl-tRNA syn-thetases (aaRS) that catalyze the attachment of tRNA with itscognate amino acid (tyrosine). We generated a maximum likeli-hood phylogenetic tree based on aligned bacterial amino acidTyrRS sequences sampled to include representatives from allmajor bacterial groups and used archaeal homologs as an outgroup(Fig. 1 andTable S1). Similar to earlier analyses (20), we detect twotypes of this enzyme in Bacteria that we refer to as types A and B,each forming a well-supported distinct clade. These two types seemto have evolved early in or before the deepest divergence betweenextant Bacteria. Inclusion of divergent outgroup sequences, to-gether with both types of bacterial TyrRS in a single analysis,lowers the resolution and may make the analysis susceptible tosystematic artifacts. We, therefore, also analyzed the phylogeny of

each bacterial TyrRS type separately (Fig. S1). Although manybacterial phyla (Spirochaetes, Firmicutes, Planctomycetes, Bac-teroidetes, Verrucomicrobia, Actinobacteria, and δ-, γ-, andβ-proteobacteria) are represented in both clades, in our analysis of142 bacterial genomes, only three carry both types.For each of the two TyrRS groups (Fig. 1), phylogenetic re-

construction resulted in trees similar to the ribosomal tree (Fig. 2).For each clade of bacterial TyrRS, the different Proteobacteriagroup together, and the Firmicutes constitute deeper branchinglineages. Although the affiliation of members of a particularphylum does not exactly match that of the ribosomal tree, theoverall phylogenetic structure for each bacterial TyrRS type issimilar at the class and phylum levels. A few exceptions, such asthe clustering of Pseudomonas aeruginosa (γ-proteobacteria) withthe rest of the α-proteobacteria and Desulfuromonas acetoxidans

Fig. 1. Phylogenetic analysis of bacterial TyrRS amino acid sequences. Thetree is rooted using the sequences from Aeropyrum pernix and Meth-anococcus maripaludis as outgroup. Numbers on each branch show the sup-port values (from left to right: calculated using PhyML (30) with JTT+I+G,calculated distance analysis using TREE-PUZZLE (31) and PUZZLEBOOT (31),calculated Bayesian inference using MrBayes (33), and calculated using PhyML(30) with LG+I+G. Dash indicates values below 50% for bootstrap supportvalues and below 0.50 for posterior probabilities. Phylum and/or class namesare indicated on the right of the tree and represented by colored boxes.

Fig. 2. Phylogenetic analysis of the 16S rRNA sequences of all taxa includedin Fig. 1. Bacterial taxa with a type A TyrRS are shown having greenbranches, bacterial taxa with a type B TyrRS are shown having red branches,and bacterial taxa for those taxa carrying both types are shown having bluebranches. The tree is rooted using the sequences from Aeropyrum pernixand Methanococcus maripaludis as outgroup (black branches). The tree wascalculated using PhyML using the Hasegawa, Kishino, and Yano model.Deeper branches are colored under the assumption that the last commonancestor of the Bacteria has both types, and the observed distribution isexplained only through gene loss. Fifty-nine independent gene-loss eventsare needed to explain the data through differential gene loss. Colored blocksrepresent different phyla corresponding to those found in Fig. 1.

10680 | www.pnas.org/cgi/doi/10.1073/pnas.1001418107 Andam et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0

Page 3: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

(δ-proteobacteria) with Bacteroidetes in the type A clade, dooccur, which one would expect for rare interphylum transfers.

Similarity of Phylogenetic Patterns Between TyrRS and 16S rRNA. Toassess the agreement in phylogeny for each TyrRS type with the16S rRNA, we plotted the pairwise distances in TyrRS sequenceagainst their corresponding distances in 16S rRNA for each pairof taxa (Fig. 3A). These types of plots have been previously usedto detect gene transfers (21–23). However, in this study, we didnot use this approach to show HGT but to illustrate the corre-lation between each of the two TyrRS clades and the 16S rRNAtree. We show that positive correlation exists, despite the oc-currence of gene transfer, and that there is no obvious trace ofHGT that can be detected by this method, because the transfersoccur between taxa with close phylogenetic affinity.Within each TyrRS type, the distances between pairs of TyrRS

orthologs and the corresponding 16S rRNA gene pairs showa strong correlation (Fig. 3A) [R2 = 0.89 and P < 2.2 × 10−16 fortype A (plotted in green) and R2 = 0.93 and P < 2.2 × 10−16 fortype B (plotted in red); R2 is the square of the correlation co-efficient). Within each of the two TyrRS clades, evolution issimilar to the expected ribosomal phylogeny, although closelyrelated bacteria can be found in the two different TyrRS cladesand only rarely does an organism encode both types in its genome.

The distances between the two TyrRS types do not correlate to thecorresponding 16S rRNA distances (in blue). Each type, whenconsidered alone (Fig. S1), behaves as expected for a gene passedon through vertical inheritance. Only when both types are con-sidered does a strong conflict with the organismal phylogeny, asinferred from the 16S rRNA sequences, become apparent.

HGT Versus Gene Loss only. Our explanation to account for therecovery of the expected organismal phylogeny from each TyrRStype is that these genes were transferred mainly within majorbacterial groups with a transfer bias to partners more closely re-lated as judged by the ribosomal RNA phylogeny. In this setting,a lineage can gain and lose genes as it continues to swap geneticmaterial with other organisms. The alternative scenario is that thetwo types were present in the last common ancestor of the bac-terial domain and were differentially lost across lineages. Thislatter scenario implies the stable coexistence of the two TyrRStypes over long periods of time followed by late gene-loss events(i.e., losses occurred after the divergence of the different bacterialphyla). In the case of the genus Pseudomonas, in which somegenomes carry both types and someonly have typeB (SIDiscussion),losses of type A would have had to occur extremely late in evo-lution, after the divergence of the genus Pseudomonas (see SIDiscussion). If we were to assume that the ancestral organism

Fig. 3. Comparison and analyses of tyrosyl-tRNA synthetase sequences from different bacteria. (A) Scatterplot of pairwise evolutionary distances between the16S rRNA (x axis) and TyrRS (y axis) sequences in a diverse sampling of bacteria. Distances between taxa within the TyrRS type A clade (plotted in green) showa strong correlationwith the 16S rRNA distances (R2 = 0.86; P< 2.2× 10−16), as do the distances within the TyrRS type B plotted in red (R2 = 0.93, P< 2.2 × 10−16). Astrong correlation would be expected between the pairwise distances of two trees with similar topologies. The evolutionary distances between the two TyrRSclades relative to those taxa distances within the 16S rRNA phylogeny (plotted in blue) are relatively large and indicative of an ancient duplication. (B) The geneneighborhood of the two tyrRS types carried by Pseudomonas aeruginosa PAO1 comparedwith other species that carry either type A or B. (C) Plots ofmean and95% confidence (error bars) of R2 values of the pairwise evolutionary distances among orthologs of 16S rRNA and each TyrRS type defined bymajor clade (TyrRStype A in green and type B in red) during 50 unbiased (to the left of the vertical dashed line) and subsequent biased (to the right of the vertical dashed line)simulated TyrRSHGTevents (n=10). TheR2 valueswereused as ameasureof congruencebetween the TyrRSphylogenyand the 16S rRNAphylogeny. Simulationsstarted with the evolutionary distances observed in extant sequences (details in SI Materials and Methods).

Andam et al. PNAS | June 8, 2010 | vol. 107 | no. 23 | 10681

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0

Page 4: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

carried both types of this enzyme and map the presence of eachtype on the 16S rRNA phylogenetic tree, a minimum of 59 in-dependent recent gene-loss events will have to have occurred overtime (Fig. 2). Assuming similar selection acted on the tyrRS genesthroughout bacterial evolution, it seems unlikely that two TyrRStypes frequently co-occurred and persisted in the same organism,whereas today, only very few organisms carry genes for both. Incontrast to this expectation, the gene loss only scenario requiresthat the two TyrRS types frequently co-occurred and have beenmaintained in the same organism for a long period.To provide a quantitative test for these two scenarios, we

tested their fit to the observed data using a probabilistic model ofevolution from the program LGT3State (24) that estimates thetransition probabilities among the three possible TyrRS statesand calculates the maximum log-likelihood values for each sce-nario. The LGT3State model has four parameters that give therates of gain and loss of each TyrRS type. The probabilities ofgene-loss and gene-gain events are not necessarily equal, anda scaling factor relates edge lengths in the tree to gain/loss events(24). Both gene losses and gains of the alternative gene can occur(i.e., a genome carrying a tyrRS type B can gain a type A and viceversa, resulting in a genome with both types A and B presentand from which one tyrRS type may eventually be lost). The nullmodel states that HGT does not occur in the evolution of thetyrRS genes and only gene-loss events can explain the distributionpatterns of types A and B in the bacterial domain (i.e., thismodel corresponds to the previous one but with the rates for gainset to 0). The null model implies that the bacterial ancestorcarried both types of TyrRS. We obtained a maximum log-likelihood value of −101.8 for the null model and −88.7 for theHGT model. Using parametric bootstrap as implemented inLGT3State (24), we generated 1,000 datasets under the nullmodel. The log-likelihood values obtained for all 1,000 datasetsunder the HGT model, ranging from −89.7 to −111.8, werelower than those for the original dataset. We, therefore, canreject the null model (ancient gene duplication followed by geneloss only) with a significance level of P < 0.001. These resultssuggest that a combination of gene loss and gene gain throughHGT occurred in the evolution of the bacterial tyrRS genes.If we were to assume that the common ancestor of the Bac-

terial domain carried only the TyrRS type A, there would havebeen a total of 39 transitions to the type B or to the presence ofboth A and B (determined from Fig. 2). A total of 26 transitionscan explain the present taxonomic distribution if the commonancestor had carried only the B type.The number of extant bacteria in our analysis that carry two

copies of the gene is small (3 of 142 genomes selected to providea wide phylogenetic sampling of the bacterial domain), whereasall others have one copy or the other (Fig. 1). Of the more than1,000 bacterial genomes available on the National Center forBiotechnology Information (NCBI) database, only 30 specieswere found to carry both types of tyrRS. The acquisition ofa second tyrRS gene may temporarily render a selective benefit tothe organism. However, over long periods of time, the secondcopy may become redundant, and one of the copies may be lostwithout detrimental effect to the organism (Table S2) (SI Dis-cussion provides information on the preservation of the two typesof tyrRS genes in a genome).

HGT Between Close Relatives Inferred from Phylogenetic Incongruity.We observed that small, well-resolved clades in the TyrRS treeare broken up in the 16S rRNA tree across bipartitions with highbootstrap support. These phylogenetic conflicts are suggestive ofHGT between close relatives. For example, the six-genome cladeencompassing Xylella fastidiosa to Halorhodospira halophila inthe TyrRS tree (Fig. 1) is separated by well-supported biparti-tions in the 16S rRNA tree (Fig. 2). In the same vein, Mar-inomonas sp., Cellvibrio japonicus, Aliivibrio salmonicida, and

Psychromonas sp. carrying type A cluster together in the TyrRStree (Fig. 1), but this group is broken up in the ribosomal tree(Fig. 2). The Firmicutes showed a similar pattern with Natra-naerobius thermophilus, Clostridium difficile, and Desulfitobacte-rium hafniense carrying type B clustering together in the TyrRStree (Fig. 1) but dispersing in the 16S rRNA tree (Fig. 2). Ourfinding of the involvement of HGT in the evolution of bacterialTyrRS does not exclude vertical inheritance, but it implies thatvertical inheritance alone cannot explain the observed TyrRSpatterns because of the overall incongruity with the rRNA tree.The identity of the exchange partners can be inferred by looking

at the ribosomal tree that shows the mapping of the two types ofTyrRS (Fig. 2). Within the γ-proteobacteria, for example, A. sal-monicida and Photobacterium profundum share a most recentcommon ancestor based on the 16S rRNA tree (Fig. 2). However,each carries a different TyrRS type. Based on the distribution ofthe two TyrRS on the 16S rRNA tree, it is possible that P. pro-fundum has acquired its TyrRS type B from a closely related cladethat includes Mannheimia succiniciproducens. The membersof this group all carry a type B enzyme and are located close toP. profundum (Fig. 2). In the same way, A. salmonicida may haveobtained its TyrRS type A from any one of the neighboring taxathat possessed the same type of synthetase (Fig. 2).These examples illustrate that exchange bias between close rel-

atives maintains the large-scale similarity between each clade ofTyrRSand the16S rRNAtree; however, at a smaller scale like in thecase of the γ-proteobacteria, topological conflicts with the 16SrRNA tree can be observed. These differences in branching pat-terns within the group result from gene transfer that takes placeamong members of that assemblage. However, as long as the indi-viduals continue to swapgenes, thegroupas awholewill persist overtime because of the higher levels of gene transfer within each groupthan between groups. Then, this biased HGT acts as a homogeniz-ing force that provides long-term constancy to prokaryotic groups.

Characteristicsof theGenomicNeighborhoodof tyrRS inγ-proteobacteria.Observation of the gene order (Fig. 3B) within the genomic vicinityof tyrRS in several γ-proteobacterial genomes reveals similarneighboring gene identity and order for both tyrRS types A and B.In several different genomes, either tyrRS type was found amonganmK, argC, 16S rRNA, and 23S rRNA. For example, Mar-inomonas sp., C. japonicus, and Psychromonas sp. have the tyrRStype A, whereas Marinobacter aquaeolei, Chromohalobacter sale-xigens, Hahella chejuensis, and Pseudomonas aeroginosa have thetyrRS type B; in all cases, the surrounding genes show a high degreeof synteny. The similarities in tyrRS gene neighborhood but dif-ferences in tyrRS type imply that the integration of the transferredtyrRS occurred by homologous recombination. Although the twotyrRS types have low identity, homologous replacement withbreakpoints in the neighboring genes may have facilitated theswitching between tyrRS types. P. aeruginosa also carries tyrRS typeA, but the genomic neighborhood of this gene does not showsynteny with that of the tyrRS type A of other closely related taxa.The α-proteobacteria may have been the donor of this additionaltyrRS gene inP. aeruginosa, because they group closely in Fig. 1 andwould be a rare example of an interphylum transfer.The finding of similar tyrRS gene neighborhoods in close rela-

tives carrying either type A or type B provides evidence for thereplacement of one form of tyrRSwith another form. Significantly,this rejects the differential loss hypothesis for these organisms. Ifthe last common bacterial ancestor possessed both types ofenzymes, we would expect the genes coding for each type of tyrRSto possess dissimilar genes flanking them. It was not possible totest this hypothesis between more divergent organisms because ofa loss of genomic synteny.

Biased HGT Can Maintain Patterns of Shared Ancestry. According toour hypothesis, the 16S rRNA-like phylogeny within each TyrRS

10682 | www.pnas.org/cgi/doi/10.1073/pnas.1001418107 Andam et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0

Page 5: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

clade has been maintained by vertical inheritance and biasedtyrRS transfers. If gene transfer events were unbiased, we wouldexpect a departure from the 16S rRNA-like taxon distribution(i.e., a decrease in correlation between each of the two TyrRStypes and their corresponding 16S rRNA distances). To test thisprediction, we simulated biased and unbiased recent HGT eventson the TyrRS phylogeny inferred from the observed data. Usingthe TyrRS evolutionary distances, 50 unbiased HGT events weresimulated by randomly selecting pairs of species and sequentiallyswapping them. This approach assumes the same evolutionaryhistory that gave rise to the real data, but it adds recent additionaltransfer events that are approximated as exchanges betweenrandomly selected genomes. These simulations avoid potentialartifacts introduced when fitting the distances to a tree model andallow us to test the effects of biased versus unbiased transfers.The R2 of pairwise evolutionary distances, used above to assessthe phylogenetic similarity of each TyrRS clade with that of the16S rRNA clade, were recalculated after each simulated HGTevent. We observed the predicted loss of correlation betweeneach of the two TyrRS types and their corresponding 16S rRNAdistances relative to the observed data (Fig. 3C and Fig. S2).For the subsequent 250 HGT events, a bias in the selection of

swapping partners was introduced, weighted in favor of shorter16S rRNA distances between potential partners. For each type ofTyrRS, biased HGT events resulted in an increase in the R2 ofpairwise evolutionary distances (Fig. 3C and Fig. S2). The initialloss in correlation shows that unbiased HGT will destroy thephylogenetic signal originally present. The subsequent biasedHGT simulations, beyond the vertical dashed line (Fig. 3C), showthat, when biased by 16S rRNA relatedness, HGT can recreatepatterns of vertical descent, represented by an increasing simi-larity between the ribosomal and TyrRS phylogenies measured asthe R2. Furthermore, given that each of the 250 simulated biasedtransfers involves 2 of 142 genomes, the possibility arises thata pattern resembling that of vertical descent could remain even ifevery tyrRS copy has undergone a transfer at least one time (i.e.,none of the lineages might have retained the same copy contin-uously since the bacterial common ancestor).

DiscussionEvidence for Biased Gene Transfer Between Close Relatives.We haveshown three lines of evidence for preferential gene transferhaving the potential to create phylogenetic patterns comparablewith those generated by shared ancestry. These transfers arecharacterized by the preference of taxa to exchange genes withpartners more similar to themselves rather than rare HGTs thatmay occur randomly and indiscriminately.First, phylogenetic reconstruction showed that, from a large-

scale perspective, each of the two major clades of bacterial TyrRSresembles the 16S rRNA (Figs. 1–3). However, at a smaller scale(i.e., within some groups below the class level), we observedbranching patterns that were in conflict with the 16S rRNA tree.This indicates that there is a higher frequency of gene transferwithin than between members of closely related groups. Second,under a maximum likelihood framework, we showed that theHGT model is preferred over the differential loss model based onobserved character states and inferred phylogenies (LGT3State)(24). Third, similar genes were observed flanking the tyrRS typesin close relatives, providing evidence for the replacement of onetype of tyrRS with another type (Fig. 3B). This observation rejectsthe differential loss hypothesis for those organisms.Furthermore, we showed in silico that biased gene transfer

among more closely related partners can maintain patterns cre-ated through vertical descent. It is the biased nature of the HGTevents we are inferring that causes our contrasting observations.We report both an apparent absence of HGT between highertaxonomic levels across the Bacteria and the presence of multi-ple HGT events within lower taxonomic levels.

Population Genetic Point of View. In analyzing the evolutionaryhistory of microbial lineages, the principles of population ge-netics can be applied to higher taxonomic units (25). A pop-ulation is an aggregate of freely crossing individuals that possessa common gene pool, which acts as a genetic reservoir that lin-eages of that population may have access to as they evolvethrough time. However, because of HGT, the gene pool is notlimited to an individual species. Whereas the phylogenetic rangeover which bacteria can swap genes is far broader than previouslyimagined, transfers are biased to related organisms (2, 26).The two TyrRS enzymes can be seen as variants similar to

alleles in a population. Individual lines of descent possess one orthe other of these variants, and in rare instances, they can possessboth variants. We propose to call these variants homeoalleles.Like alleles, they have the same general function but can havedistinct characteristics. The line of descent has access to a genepool that contains both tyrRS homeoalleles, whereas individualstrains and species usually contain only one of the homeoalleles.Genes in the accessible gene pool have different propensities tobe swapped into an individual line of decent. The rate of suc-cessful transfers may relate to overall similarity because of severalreasons: organisms use the same transfer machinery, phages in-fect both organisms, machineries for transcription and translationare similar and recognize the same controls, and similar signalsfunction in replication and genome organization (26). The moregenes that two-lineages transfer between each other, the moresimilar that the lineages become and the more frequently thatthey will continue to exchange genes.The picture that emerges for the line of descent and the gene

pool it can access is of a line representing the history of the genomesurrounded by a vapor cloud of genes representing the gene pool.Genes in this cloud can be integrated into the genome, and genesthat are lost from the genomes can remain in this cloud. This vaporcloud of genes includes the mobilome (27) as well as the genespresent in other genomes. Genes that are close to the genome lineare more frequently integrated into the genome, whereas genes atthe periphery of the cloud are only infrequently integrated.The two TyrRS types are so divergent (∼28% identity between

the two Pseudomonas TyrRS types) that the possibility of homol-ogous recombination within the TyrRS homeoalleles is unlikely.Consequently, in case of TyrRS homeoalleles, we expect that theunit of successful HGT includes at least the complete tyrRS gene.Then, homeoalleles can be brought together temporarily in a lin-eage after horizontal transfer, but because of the largely over-lapping function, only one of the homeoalleles is likely to survive ina lineage on the long run (SI Discussion provides information onthe preservation of the two types of tyrRS genes in a genome).Previous studies on other aaRS gene families have shown pat-

terns similar to that found in TyrRS (20, 21). Thus, biased genetransfer may not be restricted only to bacterial TyrRS, and home-oallele transfer may be an important mechanism in microbial evo-lution. Analyses of other potential homeoalleles will provide insightto the extent of biased transfer and its role as a natural process, inaddition to vertical inheritance, in the evolution of lineages.

Cladistics Versus Phenetics. For each type of TyrRS, the observedphylogenetic pattern is created through a combination of verticalinheritance and biased HGT. It is reasonable that similar pat-terns of gene transfer are followed in other gene families as well(14, 15). Consequently, the degree of relatedness, as measuredby the average phylogenetic signal retained in the genomes, isa function of two processes: the frequency by which organismsswap genes with each other and how long ago they evolved froma common organismal ancestor.To the extent that an observed phylogenetic signal is caused by

biased gene transfer, the resulting signal can be considered aphenetic one. If genes are mainly transferred between closely

Andam et al. PNAS | June 8, 2010 | vol. 107 | no. 23 | 10683

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0

Page 6: Biased gene transfer mimics patterns created through ... · Biased gene transfer mimics patterns created through shared ancestry Cheryl P. Andam1, David Williams1, and J. Peter Gogarten2

related organisms, then gene transfer reinforces similarity, re-gardless of if it is because of shared ancestry or biased HGT.Most often, biased gene transfer will reinforce similarity because

of recent shared ancestry. Thus, instead of eroding the phyloge-netic signal retained in a genome,HGTmay act to reinforce groupscreated through vertical descent. However, HGT that is not biasedby similarity because of shared ancestry but is biased by otherfactors, such as overlapping ecological niches, can eventuallyoverwhelm the signal created through shared ancestry (14, 15).Both HGT and vertical inheritance are natural processes that in-fluence the phylogenetic signal contained in genomes, and thus,groups defined by an averaged phylogenetic signal could be con-sidered natural; however, there is no guarantee that they are onlycaused by shared organismal ancestry. Consideration of the pro-pensity of individual genes to be horizontally transferred may helpto differentiate between the vertical and horizontal signals.

Conclusion and Outlook. In case of bacterial TyrRS, the twohomeoalleles can be clearly distinguished. The distribution ofthese homeoalleles reveals their transfer between related organ-isms. For gene families without homeoalleles or with homeo-alleles that cannot be recognized easily, conflicts between geneand species phylogenies remain the best possibility to infer pastHGTs. However, phylogenetic conflicts caused by HGT betweenclose relatives are difficult to distinguish from those caused bylack of resolution, although increased taxon sampling may im-prove the reliable detection of HGTs in such cases. The best ap-proach to gauge the relative influence of shared ancestry versusbiased HGT on the average phylogenetic signal contained in mi-crobial genomes might be the analyses of additional homeoallele-

containing gene families and realistic modeling of the processesthat led to the disjunct homeoallele distributions. Such analysesmay allow a more confident characterization of lineages with re-spect to the long-term maintenance of homeoalleles and thecharacter of HGT biases.

Materials and MethodsProtein sequences of TyrRS were retrieved by BLASTP searches of the non-redundant protein database and BLAST microbial genome database fromthe NCBI website (28). Sequences were aligned using the MUSCLE algorithm(29) with default parameters. Phylogenetic reconstruction of the TyrRSsequences was performed using PHYML (30), TREEPUZZLE (31), PUZZLEBOOTv1.03 (32), and MrBayes version 3.1.2 (33). The congruence between theTyrRS phylogeny and the 16S rRNA phylogeny within each of the two majorclades was assessed by calculating the R2 of the pairwise evolutionary dis-tances of the two datasets. Scatterplots, statistical analysis, and simulationsof HGT events were implemented using the R statistical computing languageversion 2.9.2 (34). HGT events were simulated by performing reciprocalswaps of TyrRS sequences between pairs of taxa. A likelihood-based evolu-tionary simulation for inferring rates of HGT and gene loss was performedusing the program LGT3State (24). To identify genes neighboring the twotyrRS types, genomic synteny among several members of the γ-Proteobac-teria was studied by aligning the genomes using the Integrated MicrobialGenomes software tool provided by the U.S. Department of Energy JointGenome Institute (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi; SI Materialsand Methods).

ACKNOWLEDGMENTS. The authors thank Olga Zhaxybayeva, Greg Fournier,and Ford Doolittle for many useful suggestions and discussions during thecourse of this study, Timothy Harlow for help with installing and running theprogram LGT3State, and the Biotechnology Bioservices Center of theUniversity of Connecticut for technical support. This work was supportedby National Science Foundation Grant DEB 0830024.

1. Hilario E, Gogarten JP (1993) Horizontal transfer of ATPase genes—the tree of lifebecomes a net of life. Biosystems 31:111–119.

2. Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of genetransfer. Mol Biol Evol 19:2226–2238.

3. Kurland CG, Canback B, Berg OG (2003) Horizontal gene transfer: A critical view. ProcNatl Acad Sci USA 100:9658–9662.

4. Huang J, Gogarten JP (2006) Ancient horizontal gene transfer can benefitphylogenetic reconstruction. Trends Genet 22:361–366.

5. Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (2006)Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal genetransfer events. Genome Res 16:1099–1108.

6. Huang J, Xu Y, Gogarten JP (2005) The presence of a haloarchaeal type tyrosyl-tRNAsynthetase marks the opisthokonts as monophyletic. Mol Biol Evol 22:2142–2146.

7. Gevers D, et al. (2005) Opinion: Re-evaluating prokaryotic species. Nat Rev Microbiol3:733–739.

8. Gogarten JP (1995) The early evolution of cellular life. Trends Ecol Evol 10:147–151.9. Dagan T, Artzy-Randrup Y, MartinW (2008) Modular networks and cumulative impact of

lateral transfer inprokaryotegenomeevolution.ProcNatlAcadSciUSA105:10039–10044.10. Lerat E, Daubin V, Moran NA (2003) From gene trees to organismal phylogeny in

prokaryotes: The case of the gamma-Proteobacteria. PLoS Biol 1:E19.11. Ciccarelli FD, et al. (2006) Toward automatic reconstruction of a highly resolved tree

of life. Science 311:1283–1287.12. Zhaxybayeva O, Lapierre P, Gogarten JP (2004) Genome mosaicism and organismal

lineages. Trends Genet 20:254–260.13. Puigbò P, Wolf YI, Koonin EV (2009) Search for a ‘Tree of Life’ in the thicket of the

phylogenetic forest. J Biol 8:59.14. Boussau B, Guéguen L, Gouy M (2008) Accounting for horizontal gene transfers

explains conflicting hypotheses regarding the position of aquificales in the phylogenyof Bacteria. BMC Evol Biol 8:272.

15. Zhaxybayeva O, et al. (2009) On the chimeric nature, thermophilic origin, andphylogenetic placement of the Thermotogales. Proc Natl Acad Sci USA 106:5865–5870.

16. Sorek R, et al. (2007) Genome-wide experimental determination of barriers tohorizontal gene transfer. Science 318:1449–1452.

17. Swithers KS, Gogarten JP, Fournier GP (2009) Trees in the web of life. J Biol 8:54.18. Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc

Natl Acad Sci USA 102:14332–14337.

19. Olendzenski L, Zhaxybayeva O, Gogarten JP (2001) Cellular Origin and Life in ExtremeHabitats. Vol. 4: Symbiosis, ed Seckbach J (Kluwer, Dordrecht, The Netherlands), pp67–78.

20. Woese CR, Olsen GJ, Ibba M, Söll D (2000) Aminoacyl-tRNA synthetases, the geneticcode, and the evolutionary process. Microbiol Mol Biol Rev 64:202–236.

21. Farahi K, Pusch GD, Overbeek R, Whitman WB (2004) Detection of lateral genetransfer events in the prokaryotic tRNA synthetases by the ratios of evolutionarydistances method. J Mol Evol 58:615–631.

22. Novichkov PS, et al. (2004) Genome-wide molecular clock and horizontal genetransfer in bacterial evolution. J Bacteriol 186:6575–6585.

23. Gogarten JP, Zhaxybayeva O (2008) Computational Methods for UnderstandingBacterial and Archaeal Genomes, eds Xu Y, Gogarten JP (Imperial College Press,London), pp 137–151.

24. Stern A, et al. (2010) An evolutionary analysis of lateral gene transfer in thymidylatesynthase enzymes. Syst Biol 59:212–225.

25. Gogarten JP, Townsend JP (2005) Horizontal gene transfer, genome innovation andevolution. Nat Rev Microbiol 3:679–687.

26. Lawrence JG, Hendrickson H (2005) Genome evolution in bacteria: Order beneathchaos. Curr Opin Microbiol 8:572–578.

27. Frost LS, Leplae R, Summers AO, Toussaint A (2005) Mobile genetic elements: Theagents of open source evolution. Nat Rev Microbiol 3:722–732.

28. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignmentsearch tool. J Mol Biol 215:403–410.

29. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and highthroughput. Nucleic Acids Res 32:1792–1797.

30. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate largephylogenies by maximum likelihood. Syst Biol 52:696–704.

31. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximumlikelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics 18:502–504.

32. Holder ME, Roger A (1999) PUZZLEBOOT (Marine Biological Laboratory, Woods Hole,MA).

33. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference undermixed models. Bioinformatics 19:1572–1574.

34. Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. J ComputGraph Stat 5:299–314.

10684 | www.pnas.org/cgi/doi/10.1073/pnas.1001418107 Andam et al.

Dow

nloa

ded

by g

uest

on

Nov

embe

r 16

, 202

0