“Patchy-Tachy” LeadstoFalsePositives forRecombination · region (Pesole et al....

19
“Patchy-Tachy” Leads to False Positives for Recombination Stephanie Sun, 1 Ben J. Evans, 1 and G. Brian Golding * ,1 1 Department of Biology, McMaster University, Hamilton, Ontario, Canada *Corresponding author: E-mail: [email protected]. Associate editor: Hideki Innan Abstract Indirect tests have detected recombination in mitochondrial DNA (mtDNA) from many animal lineages, including mammals. However, it is possible that features of the molecular evolutionary process without recombination could be incorrectly inferred by indirect tests as being due to recombination. We have identified one such example, which we call “patchy-tachy” (PT), where different partitions of sequences evolve at different rates, that leads to an excess of false positives for recombination inferred by indirect tests. To explore this phenomena, we characterized the false positive rates of six widely used indirect tests for recombination using simulations of general models for mtDNA evolution with PT but without recombination. All tests produced 30–99% false positives for recombination, although the conditions that produced the maximal level of false positives differed between the tests. To evaluate the degree to which conditions that exacerbate false positives are found in published sequence data, we turned to 20 animal mtDNA data sets in which recombination is suggested by indirect tests. Using a model where different regions of the sequences were free to evolve at different rates in different lineages, we demonstrated that PT is prevalent in many data sets in which recombination was previously inferred using indirect tests. Taken together, our results argue that PT without recombination is a viable alternative explanation for detection of widespread recombination in animal mtDNA using indirect tests. Key words: recombination, false positives, mitochondrial DNA, heterotachy, substitution rate heterogeneity, animal mtDNA. Research article Introduction There is considerable evidence to support the phenomenon of recombination between different animal mitochondrial DNA (mtDNA) molecules. Mammalian mitochondrial pro- tein extracts can catalyze recombination (Thyagarajan et al. 1996) and mtDNA genomes may mix because mitochon- dria are capable of fusion (Wilson 1916; Bereiter-Hahn and oth 1994). Mitochondrial recombinant genotypes can be detected when two parental cells are fused (Birky 2001), and recombination products have been directly observed in cases where both paternal and maternal mtDNA were in- herited; the gonads of male bivalve mussels (Ladoukakis and Zouros 2001) and in the muscle cells of a human individual (Kraytsberg et al. 2004). Although recombination has been observed in animal mtDNA, this recombination is an excep- tion to two common assumptions in mitochondrial genet- ics: strict maternal inheritance and vegetative segregation. In other animals, maternal inheritance and vegetative seg- regation are generally thought to maintain the presence of only one mtDNA genotype in an individual (Birky 2001), a state referred to as homoplasmy. Whether mtDNA recombination is pervasive enough to require a serious reevaluation of animal population studies remains contentious due to challenges associated with collecting direct evidence of recombination which stem, at least in part, from the prevalence of homoplasmy (Neiman and Taylor 2009). Due to mtDNA homoplasmy, “indirect” tests for recombination have been developed wherein recombination is inferred from patterns of molecular vari- ation as opposed to direct comparison of nonrecombined parental sequences to potentially recombined sequences. The basis of indirect tests is that over evolutionary time, occasionally more than one mtDNA genotype can be present in a cell and if recombination occurs between these molecules, it should be detectable. The molecular signa- ture left behind by recombination could include an uneven distribution of polymorphic sites (Maynard Smith 1992; Posada and Crandall 2001), regions with high sequence sim- ilarity (Sawyer 1989), clustering of phylogenetically incom- patible sites (Jakobsen and Easteal 1996), or a correlation of linkage disequilibrium with physical distance (Awadalla et al. 1999; Piganeau et al. 2004). The use of indirect re- combination tests have led to reports of recombination in diverse animal mtDNA including crustaceans, amphibians (Ladoukakis and Zouros 2001), lizards (Ujvari et al. 2007), scorpions (Gantenbein et al. 2005), fish (Ciborowski et al. 2007), birds, insects, nematodes, and mammals including nonhuman primates (Maynard Smith 1992; Ladoukakis and Zouros 2001; Piganeau et al. 2004; Tsaousis et al. 2005; White and Gemmell 2009) and humans (Kraytsberg et al. 2004). Such widespread animal mtDNA recombination raised se- rious concerns on the validity of clonal mtDNA inheri- tance, and all the models and conclusions based on this assumption. If recombination is prevalent, clearly efforts to infer evolutionary relationshipsusing bifurcating phylogenies are inappropriate. Evolutionary relationships are inferred using models whose parameter values are based on genetic infor- mation stored in sequences. Phylogenetic inference of a sin- gle evolutionary history (a phylogenetictree) is confounded when different parts of the data have different evolutionary histories, which would be more accurately represented by multiple evolutionary trees. Relevantto this issue is the con- cept of homoplasy (not to be confused with the previously © The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 28(9):2549–2559. 2011 doi:10.1093/molbev/msr076 Advance Access Publication April 15, 2011 2549

Transcript of “Patchy-Tachy” LeadstoFalsePositives forRecombination · region (Pesole et al....

“Patchy-Tachy” Leads to False Positives for RecombinationStephanie Sun,1 Ben J. Evans,1 and G. Brian Golding * ,11Department of Biology, McMaster University, Hamilton, Ontario, Canada

*Corresponding author: E-mail: [email protected] editor: Hideki Innan

AbstractIndirect tests have detected recombination inmitochondrial DNA (mtDNA) frommany animal lineages, includingmammals.However, it is possible that features of themolecular evolutionaryprocesswithout recombination couldbe incorrectly inferredby indirect tests as being due to recombination. We have identified one such example, which we call “patchy-tachy” (PT),where different partitions of sequences evolve at different rates, that leads to an excess of false positives for recombinationinferred by indirect tests. To explore this phenomena,we characterized the false positive rates of six widely used indirect testsfor recombination using simulations of general models for mtDNA evolution with PT but without recombination. All testsproduced30–99% false positives for recombination, although the conditions that produced themaximal level of false positivesdiffered between the tests. To evaluate the degree to which conditions that exacerbate false positives are found in publishedsequence data, we turned to 20 animalmtDNA data sets in which recombination is suggested by indirect tests. Using a modelwhere different regions of the sequences were free to evolve at different rates in different lineages, we demonstrated that PTis prevalent in many data sets in which recombination was previously inferred using indirect tests. Taken together, our resultsargue that PT without recombination is a viable alternative explanation for detection of widespread recombination in animalmtDNA using indirect tests.

Key words: recombination, false positives,mitochondrial DNA, heterotachy, substitution rate heterogeneity, animalmtDNA.

Researcharticle

IntroductionThere is considerable evidence to support the phenomenonof recombination between different animal mitochondrialDNA (mtDNA) molecules. Mammalian mitochondrial pro-tein extracts can catalyze recombination (Thyagarajan et al.1996) and mtDNA genomes may mix because mitochon-dria are capable of fusion (Wilson 1916; Bereiter-Hahn andVoth 1994). Mitochondrial recombinant genotypes can bedetected when two parental cells are fused (Birky 2001),and recombination products have been directly observedin cases where both paternal andmaternalmtDNA were in-herited; the gonads ofmale bivalvemussels (Ladoukakis andZouros 2001) and in the muscle cells of a human individual(Kraytsberg et al. 2004). Although recombination has beenobserved in animalmtDNA, this recombination is an excep-tion to two common assumptions in mitochondrial genet-ics: strict maternal inheritance and vegetative segregation.In other animals, maternal inheritance and vegetative seg-regation are generally thought to maintain the presence ofonly one mtDNA genotype in an individual (Birky 2001), astate referred to as homoplasmy.

Whether mtDNA recombination is pervasive enough torequire a serious reevaluation of animal population studiesremains contentious due to challenges associated withcollecting direct evidence of recombination which stem, atleast in part, from the prevalence of homoplasmy (Neimanand Taylor 2009). Due to mtDNA homoplasmy, “indirect”tests for recombination have been developed whereinrecombination is inferred from patterns of molecular vari-ation as opposed to direct comparison of nonrecombinedparental sequences to potentially recombined sequences.The basis of indirect tests is that over evolutionary time,

occasionally more than one mtDNA genotype can bepresent in a cell and if recombination occurs between thesemolecules, it should be detectable. The molecular signa-ture left behind by recombination could include an unevendistribution of polymorphic sites (Maynard Smith 1992;Posada and Crandall 2001), regions with high sequence sim-ilarity (Sawyer 1989), clustering of phylogenetically incom-patible sites (Jakobsen and Easteal 1996), or a correlationof linkage disequilibrium with physical distance (Awadallaet al. 1999; Piganeau et al. 2004). The use of indirect re-combination tests have led to reports of recombination indiverse animal mtDNA including crustaceans, amphibians(Ladoukakis and Zouros 2001), lizards (Ujvari et al. 2007),scorpions (Gantenbein et al. 2005), fish (Ciborowski et al.2007), birds, insects, nematodes, and mammals includingnonhuman primates (Maynard Smith 1992; Ladoukakis andZouros 2001; Piganeau et al. 2004; Tsaousis et al. 2005;Whiteand Gemmell 2009) and humans (Kraytsberg et al. 2004).Such widespread animal mtDNA recombination raised se-rious concerns on the validity of clonal mtDNA inheri-tance, and all the models and conclusions based on thisassumption.

If recombination is prevalent, clearly efforts to inferevolutionary relationshipsusing bifurcating phylogenies areinappropriate. Evolutionary relationships are inferred usingmodels whose parameter values are based on genetic infor-mation stored in sequences. Phylogenetic inference of a sin-gle evolutionary history (a phylogenetic tree) is confoundedwhen different parts of the data have different evolutionaryhistories, which would be more accurately represented bymultiple evolutionary trees. Relevant to this issue is the con-cept of homoplasy (not to be confused with the previously

© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

Mol. Biol. Evol. 28(9):2549–2559. 2011 doi:10.1093/molbev/msr076 Advance Access Publication April 15, 2011 2549

“Patchy-Tachy” Leads to False Positives for Recombination · doi:10.1093/molbev/msr076 MBE

described homoplasmy), which is when phylogenetic con-flict arises due to convergent evolution. Although recurrentsubstitutions are clearly an important source of homoplasyinmtDNA, there lacks a broadconsensus concerning thede-gree to which recombination contributes to phylogeneticconflict in molecular data. This raises concerns about theability of indirect tests to distinguish between phylogeneticconflict caused by recombination andhomoplasy caused byrecurrent substitutions (Galtier et al. 2006).

Mutation rate heterogeneity has been suggested as asource of false positives in indirect tests for recombina-tion (Pesole et al. 1999; Innan and Nordborg 2002; Galtieret al. 2006). Within mammalian mtDNA, mutation ratesof nonsynonymous sites are highly variable from region toregion (Pesole et al. 1999) and the mutation rates of spe-cific mtDNA sites can change quickly over time, even be-tween congeneric species (Galtier et al. 2006). Studies ofmutational processes suggest that clustering of mutationalevents (Drake 2007) and localized elevated rates of mu-tational events (Wang et al. 2007; Chen et al. 2009) arefrequently observed. Mutation rate heterogeneity seemsto produce the most false positives when there is a highdegree of contrast between rates of adjacent sites. Thisis supported by Innan and Nordborg (2002), where sim-ulations with both hot and cold spots produced a sig-nificant correlation between linkage disequilibrium anddistance, whereas simulations with only hot or only coldspots did not.

However, previous studies have not recovered anelevated level of false positives across diverse types ofindirect recombination tests. Even with ! = 200 and a mu-tation rate variation parameter " = 0.05, fewer than 10%false positives were produced by Max #2, GENECONV, andReticulate (Posada and Crandall 2001). As high ! and low"are the conditions under which substitution hot spots areexpected to occur, these results suggest that a simple substi-tution hot spot model over all sequences does not producea substantial level of false positives. A more sophisticatedsubstitution hot spot model over all sequences was inves-tigated in Bruen et al. (2006) using highly correlated ratesbetween neighboring sites and high substitution rate het-erogeneity. Elevated levels of false positives were producedbut only by Max #2 and Reticulate and only when the site-to-site rate correlation was extremely high. Although thesestudies demonstrated that substitution rate heterogeneitywas capable of producing false positives for recombination,their results generally suggested that most indirect testswere not susceptible to such heterogeneity under biolog-ically realistic conditions. Therefore, the indirect evidencefor widespread animalmtDNA recombination could not bediscounted.

Thus, given that the mutation/substitution process canlead to unusual and varied patterns that can potentially beinappropriately inferred as due to recombination, we ex-plored a model of substitution rate heterogeneity wheresubsets of the taxa might have a different rate of evolutionfor a portion of the data. In such a model, trees estimatedfrom partitions share the same topology but have different

branch lengths for a portion of the topology. Hereafter, werefer to thismodel as the patchy-tachy (PT) model. PT com-bines traditionalmodels, including heterotachy (Lopez et al.2002), which assume a single tree topology and set of branchlengths with large patches of sequences undergoing differ-ent rates of evolution across all taxa for that section. The keydifference between PT and heterotachy is that in PT only asubset of the taxa (rather than all of the taxa) have accel-erated or decelerated rates of evolution in a patch of thesequence.

Using simulations with varying levels of diversity, sam-ple size, length, and other attributes, we described the ef-fects of PT on the false positive rates of six indirect tests:GENECONV (Sawyer 1989, 1999), Max #2 (Maynard Smith1992), LDr 2 (Hill and Robertson 1968; Awadalla et al. 1999;Piganeau et al. 2004), LD|D !| (Lewontin 1964; Awadalla et al.1999; Piganeau et al. 2004), Reticulate (Jakobsen and Easteal1996), and PHI (Bruen et al. 2006). Because the power ofthese tests have been explored in depth elsewhere (PosadaandCrandall 2001;Wiuf et al. 2001; Bruen et al. 2006), powerwas not investigated here. These tests are among the mostpowerful methods of detecting recombination when directidentification of recombinants is not feasible. They havebeen used to screen for recombination in a wide range ofanimalmtDNA.

To determine if PT might be a factor in published se-quence data, we developed a simple test. This method isbased on the relative likelihoods of models where the align-ment is divided into partitions of various lengths. In thistest, all partitionsare assumed to share the same tree topol-ogy but evolutionary rates are estimated independently bypartition. In this way, the relative rates of subsets of taxain different partitions in an alignment of sequences can becompared. We used this method to test for PT in 20 ani-malmtDNA data sets in which recombinationwas detectedvia indirect tests. We then tested whether the mtDNA-estimated level of PTwould produce an elevated rate of falsepositives using simulationsmodeled on the animalmtDNAdata sets themselves. Additionally, the results of these ex-periments, which did not include recombination,were com-pared with the level of PT detected in simulations with re-combination. Our results provide a reasonable, biologicallyfeasible, alternativeto the inference of widespread recombi-nation in animal mtDNA.

Materials and MethodsTests for RecombinationThe false positive rates for six widely used (table 1) testsof recombination were evaluated. Detailed explanations ofthese tests can be found in the Supplementary Materialonline, and in Posada and Crandall (2001). For GENECONV,we used 1,000 permutations and a gscale of 0. This test isabbreviated as GCG when both inner and outer fragmentsare considered and GCI when only inner fragments areused. The Max #2 implementation of Piganeau et al. (2004)was used. In Reticulate, sites with more than two alleleswere ignored. In PHI, w was set to 100. The versions

2550

Sun et al. · doi:10.1093/molbev/msr076 MBE

Table 1. Examples of Tests Used to Detect Recombination in AnimalmtDNA.

Test Used inGENECONV Piganeau et al. (2004), Tsaousis et al. (2005), Ujvari

et al. (2007), Gantenbein et al. (2005), Lawson andZhang (2009)

LD|D!| Piganeau et al. (2004), White and Gemmell (2009),Gantenbein et al. (2005)

LDr2 Piganeau et al. (2004), Awadalla et al. (1999), Whiteand Gemmell (2009), Gantenbein et al. (2005)

Max!2 Piganeau et al. (2004), Tsaousis et al. (2005), Bruenet al. (2006), Ujvari et al. (2007), White andGemmell (2009), Gantenbein et al. (2005),Maynard Smith and Smith (1999)

PHI Bruen et al. (2006), White and Gemmell (2009)Reticulate Tsaousis et al. (2005), Bruen et al. (2006), White and

Gemmell (2009), Fitzgerald et al. (1996)

of LDr 2and LD|D !|were the implementations used byPiganeau et al. (2004).

Detecting False Recombination Signals in Nonrecombin-ing Simulations with PTTo investigate the effect of PT on the level of false posi-tives, we simulated data sets without recombination butwithPT such that each third of the sequence of a select cladeevolves at a different rate in a closely related subset of taxa(fig. 1). Themethod to create these simulations is outlined infigure 2.

The creation of a simulated nucleotide data set began bygenerating a random tree topology in ms (Hudson 2002)withN sequences. From this tree, a clade of n sequences waschosen to be the PT clade. A copy of the tree was createdand modified by scaling the internal and terminal branchesof this PT clade by the factor pt. The unmodified tree wasused to create a partial alignment containing all sites out-side the PT region. The PT tree was used to build the remain-der of the alignment, which contained all sites inside the PTregion. Seq-Gen (Rambaut and Grassly 1997) was used tocreate the nucleotide alignments. Because the unmodifiedand PT trees only differed in some branch lengths and nottree topology, the alignments did not include any recom-bination. A Jukes–Cantor model was used because at thispoint, we are interested in a general characterization of theeffect of PT on the performance of indirect tests for recom-bination. Later, when we test the effect of PT on simulatedanimal mtDNA, we use a more appropriate model of evo-lution. The nucleotide data set was then screened for thetotal number of segregating sites, S. Unless the number ofsegregating sites matched the target S, the entire processwas repeated, beginning at the generation of tree topology.The target value of S was chosen so that the simulationswould reflect observed values of S from nature (specifically,a mtDNA data set from Sulawesi macaques; Evans et al.1999).

Various PT attributes were considered including the dif-ference inmutation rate inside versus outside the PT region,length of the PT partition in proportion to total sequence

FIG. 1. Example of five partitioned models for N sequences of L nu-cleotides. Partition A is represented by solid lines, partition B is rep-resented by broken lines. Partitions A and B share the same topologybut branch lengths of each section are estimated independently.

length, size of PT-affected clade, and the location of the PTpartition within the alignment (specifically, in the middlesurrounded by higher rate regions, or alternatively, at theperiphery). These parameters and the symbols used to rep-resent them are summarized in table 2.

Empirically Based SimulationsWe also developed an approach to evaluate whether realdata sets have PT where different rates can occur eitherwithin closely related taxa or within unrelated taxa (fig. 3).We focused on 20 animal mtDNA data sets from which re-combination has been indirectly detected. These data setsincluded 19 of 20 data sets from the Piganeau et al. (2004)animalmtDNA data sets with the lowest probability for thenull hypothesis of no recombination. Mytilus galloprovin-cialis, one of Piganeau et al. (2004)’s top 20 data sets, wasexcluded because recombination has been directly detectedin this species (Ladoukakis and Zouros 2001). We also in-cluded mtDNA sequences of macaques collected from theIndonesian island of Sulawesi (Evans et al. 1999). These se-quences are from seven macaque species: Macaca nigra,M. nigrescens, M. hecki, M. tonkeana, M. ochreata, M. brun-nescens, andM. maura.

2551

“Patchy-Tachy” Leads to False Positives for Recombination · doi:10.1093/molbev/msr076 MBE

FIG. 2. Creating PT data sets with simulated data sets where a part ofthe sequence has a different rate of evolution within a clade. As ex-plained more fully in the text, first a tree topology is generated usingms and the sequences are partitioned into three segments (fig. 1). Arandomly chosen clade (n ) is simulated with different rates in parti-tion B frompartitionA. The resultant partitions are concatenated andchecked for the correct number of segregating sites, until 1,000 datasets have been generated.

Noncoding regions were removed and data sets werealigned using muscle (Edgar 2004). A single consensustopology was obtained using MrBayes (Huelsenbeck andRonquist 2001; Ronquist andHuelsenbeck 2003). GTR+!+codon parameters were estimated by partitioningthe align-ment into two partitions (A and B ) according to the parti-tion models (fig. 1).

Comparing the Maximum Likelihood of the FavoredPartitioned Model with the Nonpartitioned ModelMaximum likelihood (ML) parameterswere estimated fromeach partition using baseml from PAML (Yang 1997, 2007).

Table 2. PT Simulation Parameters.

Symbol Description ValuesN Number of sequences/alignment 15, 30L Sequence length 600 bp; 1,200 bpn Number of sequenceswith cold spot 1

3N,23N

l Length of cold spot 13 L,

23 L

pt PT scale factor 0.05, 0.2, 0.5, 1.0p Position of cold spot within alignment Middle, sideS Number of segregating sites 0.05L, 0.1L, 0.2L, 0.5L

The lnLmax for partition A and the lnLmax for partition Bwere summed to obtain the total likelihoods with alter-native partition regimes. Using the Akaike information cri-terion (AIC; Akaike 1974), the null model (AAA ) and PTmodels (ABA, BAA, AAB, or B!A+B!; fig. 1) were com-pared. The AIC was calculated as 2(df)! 2lnL where thereare 9 df in the AAA model (5 rate ratios, 3 nucleotide fre-quencies, and1! shapeparameter), and18 in all othermod-els (9 df for partitionA and 9 df for partitionB). AIC weightswere calculated as in Wagenmakers and Farrell (2004) andrepresent the relative likelihood of themodel being the bestmodel.

The GTR + !+codon parameters and partition patternof the “best partitioned model” were used to create 1,000partitioned simulations in Seq-Gen (i.e., one Seq-Gen runper partition). The length, sample size, and S of the sim-ulations were matched to the real data sets analyzed byPiganeau et al. (2004) (supplementary table 1, Supplemen-tary Material online). Partitions A and B used the sameMrBayes consensus topology thereby ensuring recombina-tion was not included. Sequences from partitions A and Bwere then concatenated according to the favored partitionregime and tested for recombination. Null (no-PT, no re-combination) AAA simulations were created in a similarmanner except that only a single set of ML parameters anda single Seq-Gen simulation was used.

Testing Animal mtDNA for PTThere are two criteria in the test for PT: 1) The data fa-vor a PT model (see fig. 1) rather than the null AAA modeland 2) branch lengths estimated from partition A must besignificantly different from the corresponding branchlengths in partition B . The type of partitioned modelfavored by the data was determined using the AIC asdescribed above.

First, branch lengths were estimated from each parti-tion. A distribution of null branch length ratios was cal-culated by partitioning the null AAA simulations intosections according to each PT model and then dividingbranch lengths of corresponding branches from each sec-tion. A distribution of observed branch length ratios un-der the PT model was calculated by dividing each branchlength frompartitionA by the corresponding branch lengthfrom partition B . Branches where the rate in partition Aequals the rate in partition B would have a branch lengthratio of 1. Branches with a large rate difference betweenpartition A and partition B would deviate further from 1.Each branch length ratio from the PT model was then com-paredwith the null distribution of branch length ratios fromthe null (AAA ) model and a P value was calculated. TheseP values represented the proportion of null branch lengthratios that were equally or more extreme than the simu-lated PT branch length ratio. P values less than or equalto 0.05 were marked as possible signals for PT. False dis-covery rate (FDR; Benjamini and Hochberg 1995) was usedto correct the number of potential PT signals for multipletests.

2552

Sun et al. · doi:10.1093/molbev/msr076 MBE

FIG. 3.Measuring PT in animal mtDNA is done by separating the sequence into the partitions in figure 1 and calculating a Bayesian topology. Thefit of the different partitions are tested by their AIC values. If a partition is preferred the estimated parameters are used to simulate further datasets to test the ratios of branch lengths.

Testing Simulated Data with Recombination for PTIt is also possible that true recombination events mightlead to PT. Therefore, further simulations were carried outto quantify such signals that might be generated in asso-ciation with recombination. Two levels of recombinationwere tested. In the first, biologically feasible recombinationrates based on autosomal DNA and mutation rates basedon mtDNA (c and µ, respectively) were used. We useda value of µ derived from great apes of 5.2266 " 10"7

mutations/site/generation (Lynch 2007). To calculate therecombination rate, c , we assumed the same ratio of re-combination to mutation per nucleotide site as in humans,cµ= 0.6 (Ptak et al. 2004; Lynch 2007). Using these values,

we calculated c to be 3.14 " 10"7 events/site/generation.Other estimates of primate autosomal recombination rateshave been as low as c = 1.2 " 10"8 with µ = 2 " 10"8

(Becquet and Przeworski 2007). This suggests that our esti-mate for autosomal recombination rate is high. To test anextreme bound for c (3.14 " 10"6), we increased the auto-somal recombination rate by a factor of 10. This second cwas intentionally set extremely high to emphasize any pos-sible effect of recombination on PT detection (µ was leftunchanged). Each model was tested with 100 simulations

that were generated using RECODON (Arenas and Posada2007). Simulations were tested for PT in the same manneras described in the section “Testing AnimalmtDNA for PT.”Reticulate, PHI, andMax#2 were used to measure the levelof detectable recombination.

ResultsEffect of PT on Recombination False PositivesThe proportion of simulated data sets with no recombina-tion that falsely detect recombination when the simulationshave PT is shown in figure 4.When N = 15, the level of falsepositives detected by Reticulate ranged from 0.07% to 67%,by PHI ranged 0.04–34%, by GENECONV ranged 4–99.7%,by Max #2 ranged 3–99.9%, by LDr 2 ranged 4–75%, and byLD|D !| ranged 0.09–99.7%. When N = 30, the level of falsepositives was largely unchanged except for PHI, which nolonger detected an elevated level of false positives, and LDr 2

whose maximum false positive rate dropped from 72% to52%. The level detected by Reticulate ranged from 0.07%to 72%, by PHI ranged 0.01–10%, by GENECONV ranged4–99.8%, byMax#2 ranged 4–99.9%, by LDr 2 ranged 4-52%,and by LD|D !| ranged 1–99.6%. Generally, in simulations

2553

“Patchy-Tachy” Leads to False Positives for Recombination · doi:10.1093/molbev/msr076 MBE

FIG. 4. False positives in a general PT model. Percentage of recombination false positives detected in data sets with sequences 1,200 bp long,where the middle 800 bp in one-third of the sequences are scaled by pt . Top row: 15 sequences. Bottom row: 30 sequences. Black circles represent800–1,000 simulations that were tested for recombination, and gray circles mark data collected from fewer than 800 simulations. Abbreviations:patchy-tachy (pt); Reticulate(RET); PHI (PHI); Max !2 (MX); GENECONV inner fragments only (GCI); LDr 2 (LDR); LD|D !| (LDD).

with PT, the likelihood of a test reporting a false positiveincreased as the scaling factor for clade n became more ex-treme. This effectwas particularly pronouncedwhen the se-quences contained a high proportion of polymorphic sites.When the region with PT sites is located at the edge ofthe alignment, the level of false positives of PHI, LDr 2, andLD|D !| decreases to below 10% but Reticulate, GENECONV,and Max #2 false positives for recombination were unaf-fected (supplementary fig. 1 vs. supplementary fig. 3; sup-plementary fig. 2 vs. supplementary fig. 2, SupplementaryMaterial online). Shortening the overall sequence lengthresults in fewer false positives for recombination (supple-mentary fig. 1 vs. supplementary fig. 5; supplementary fig. 2vs. supplementary fig. 6, Supplementary Material online).Generally, the greatest fluctuation in the number of falsepositives between repeated simulations was #2%. In simu-lations without PT (a scaling factor of 1), the tests correctlyreturned#5% false positives.

The rate of evolution influenced whether scaling branchlengths of a subset of the taxa or of all taxa in the simu-lations gave results with more false positives for recombi-nation. With a moderate mutation rate, the level of falsepositives for recombination is lower thanwhen the substitu-tion rate of a partition is scaled in all sequences rather thanonly a subset of sequences (table 3a and 3b). However, theopposite is true when the mutation rate is high (table 3c).

Detecting PT in Animal mtDNAThree of the 20 animal data sets tested did not have de-tectable PT. Mandrillus sphinx favored a nonpartitionedAAA model over a partitioned one (table 4), whereas Den-droica petechia (Warbler) and Macrodon ancylodon (Kingweakfish) were found not to possess PT after the FDR

correction was applied. If we consider 5% or less as anacceptable level of false positives, Macrodon ancylodonand, according to GENECONV and LDr 2, Mandrillussphinx and Dendroica petechia did not have detectablePT but did have elevated false positives for recombi-nation based on simulations without recombination us-ing evolutionary parameters derived from this data set(table 5). All other data sets had PT and also had anelevated false positive recombination rate, although thedegree of false positives ranged from 99% to 7% (Micro-tus longicaudus [Vole] to Merlangius merlangus [Whiting]

Table 3. Effect of Cold Clade Size (n ) with Different BackgroundMutation Rates.

(a) Moderate background mutation rate (S = 240 sites): c = 0.05

n RET PHI GCG GCI MX LDR LDD5 8 6 9 5 7 4 1610 10 7 38 35 35 68 1315 7 6 17 12 22 9 17

(b) Moderate backgroundmutation rate (S = 240 sites): c = 0.2

n RET PHI GCG GCI MX LDR LDD5 6 7 11 7 7 5 1210 14 7 89 88 69 83 1715 5 5 31 22 33 10 11

(c) High backgroundmutation rate (S = 600 sites): c = 0.05

n RET PHI GCG GCI MX LDR LDD5 66.6 34.0 99.7 99.7 99.9 75.3 99.710 23.3 6.9 99.8 99.8 98.9 95.4 54.215 63.6 17.5 99.7 99.7 99.9 48.0 99.9

NOTE.—N = 15, L = 1, 200 bp, p =middle, l = 800 bp.

2554

Sun et al. · doi:10.1093/molbev/msr076 MBE

Table 4. AIC Values for Partition Models.

Data Set AIC Weights Data Set AIC WeightsSulawesi macaques AAA AAB Gomphiocephalus hodgsoni AAA AAB

7.590!10"65 0.999 (springtail) 2.262!10"3 0.722Bursaphelenchus conicaudatus AAA AAB Alpheus lottini AAA B!A+B!(nematode) 1.261!10"46 0.907 (snapping shrimp) 1.493!10"12 0.621Micropterus salmoides AAA AAB Macrodon ancylodon AAA AAB(bass) 6.225!10"16 0.999 (king weakfish) 2.343!10"7 0.965Macaca nemestrina AAA AAB Papio papio AAA BAA(pig-tailedmacaque) 2.602!10"27 0.999 (baboon) 2.057!10"3 0.981Microtus longicaudus AAA AAB Campylorhynchus brunneicap AAA ABA(vole) 5.266!10"56 0.999 (wren) 9.818!10"3 0.526Vesicomya pacifica AAA B!A+B! Bradypodion occidentale AAA AAB(bivalve) 6.845!10"10 0.910 (dwarf chameleon) 6.202!10"5 0.907Mandrillus sphinx AAA ABB Passerella iliaca AAA AAB(mandrill) 0.396 0.277 (sparrow) 7.915!10"2 0.653Libellula quadrimaculata AAA ABA Merlangius merlangus AAA ABA(dragonfly) 2.090!10"5 0.791 (whiting) 5.456!10"3 0.994Dendroica petechia AAA ABA Gonatus onyx AAA ABA(warbler) 7.766!10"2 0.496 (squid) 1.771!10"2 0.976Apodemus sylvaticus AAA B!A+B! Grus antigone AAA B!A+B!(woodmouse) 3.526!10"38 0.999 (crane) 2.971!10"3 0.772

NOTE.—AIC weights for the no partition model (AAA ) and the best partitioned model. AIC weights represent the relative likelihood of the model being the best model. Datasets are ordered by strength of evidence for recombination according to Piganeau et al. (2004).

according to GENECONV; table 5). It is interesting to notethat data sets with stronger evidence for recombination asreported in Piganeau et al. (2004) also had parameters esti-mates that produced nonrecombining, PT simulations withthe highest levels of false positives. In summary, 17 out of the20 data sets possess both detectable PT and had parameterestimates that produced an elevated level of false positivesfor recombination in simulations without recombination.

Effect of Recombination on the Incidence of PTThe mean number of recombination events using theautosomal rate of recombination, c , was 6 events/tree.In simulations generated using the extreme c , the mean

number of recombination events was 58 events/tree (fig. 5).Although all simulations were recombinant, only 8% ofthem had detectable recombination under the autosomalrecombination model based on the indirect tests we exam-ined. Of these 8%, PT was detected in 25% (table 6). Underthe extreme recombination model, 27% of simulations haddetectable recombination and of these, PT was detected inabout 52%.

DiscussionEffect of PT on the Level of Recombination False PositivesUsing simulations under a simple Jukes–Cantor and a

Table 5. False Positives for Recombination in Simulations of Animal mtDNA.

Data set PT Detected RET PHI GCG GCI MX LDR LDDMandrillus sphinx (mandrill) N 1 4 10 5 2 32 4Dendroica petechia (warbler) N 0 0 12 11 5 13 0Macrodon ancylodon (king weakfish) N 19 12 97 97 60 81 12Sulawesi macaques Y 7 10 33 29 89 60 7Bursaphelenchus conicaudatus (nematode) Y 15 30 80 78 93 72 15Micropterus salmoides (bass) Y 7 8 43 35 23 50 9Macaca nemestrina (pig-tailedmacaque) Y 18 13 44 41 83 64 11Microtus longicaudus (vole) Y 8 16 99 99 92 19 3Vesicomya pacifica (bivalve) Y 6 3 58 53 17 38 1Libellula quadrimaculata (dragonfly) Y 5 5 14 11 7 3 5Apodemus sylvaticus (woodmouse) Y 7 6 11 9 7 7 2Gomphiocephalus hodgsoni (springtail) Y 5 2 15 14 7 28 2Alpheus lottini (snapping shrimp) Y 5 6 9 6 5 3 9Papio papio (baboon) Y 0 2 28 21 8 43 2Bradypodion occidentale (dwarf chameleon) Y 1 0 28 22 23 44 1Campylorhynchus brunneicap (wren) Y 8 8 10 9 3 9 8Passerella iliaca (sparrow) Y 3 5 24 22 7 43 7Merlangius merlangus (whiting) Y 0 0† 7 5 2 5 1†Gonatus onyx (squid) Y 0† 0† 8 5 2† 9† 0†Grus antigone (crane) Y 0 0† 50 40 13 21 1†

NOTE.—Values listed are the percentage of 1,000 simulations where recombination was detected. Values with † had fewer than 1,000 simulations tested due to low levels ofpolymorphism in some simulations. Data sets are organized by presence/absence of PT, then by strength of evidence for recombination according to Piganeau et al. (2004).

2555

“Patchy-Tachy” Leads to False Positives for Recombination · doi:10.1093/molbev/msr076 MBE

FIG. 5. The distribution of recombination events in simulations with an autosomal rate of recombination (A) and an extreme rate ofrecombination (B). Each distribution consists of 100 runs.

model of PT within a subset of closely related clades, wedemonstrated the severe degree to which PT can magnifythe level of false positives detected by indirect tests forrecombination. These simulations demonstrate the relativeperformance of indirect tests given a particular set of sim-ulations with PT in a subset of related taxa. The substitu-tion distribution tests (GENECONV,Max#2) performed theworst, followed by LD|D !|, Reticulate, LDr 2, and PHI, respec-tively. Two factors influence themagnitude of this PT effect:the number of segregating sites and the degree of scaling be-tween partitions of sites.

Performance of Substitution Distribution Tests for Recombina-tionThese tests assume that there is an even distribution of seg-regating sites when recombination is not present. This as-sumption is violated by PT, which causes substitutions tobe unevenly distributed across and between sequences. TheGENECONV criteria of considering only inner fragments(GCI) is considered more conservative than the alternativeof considering all global fragments detected (GCG), how-ever, GCI finds the same high level of false positives as GCG(supplementary figs. 5–7, SupplementaryMaterial online).

Other substitution distribution methods include theHomoplasy Test (Maynard Smith and Smith 1998), Infor-mative Sites Test (PIST; Worobey 2001), Chimaera (Posada

Table 6. Recombination and PT in Recombined Simulations Com-pared with the PT Simulation.

Recombination detected PT Detected SimulationsHigh c Extreme c mtDNA

No No 61 42 10No Yes 31 31 10Yes No 6 13 5Yes Yes 2 14 75

NOTE.—Percentage of simulations in which recombination was detectedwith Reticulate, PHI, or Max !2and the number of simulations in whichPT was detected. High: autosomal recombination rate (c = 3.14 " 10"7,100 simulations); Extreme: c = 3.14 " 10"6, 100 simulations; mtDNA:no recombination animal mtDNA simulations c = 0, 1,000 simulations,recombination is considered detected in data sets with over 5% false positives.

and Crandall 2001), the Runs Test (Takahata 1994), andthe Sneath Test (Sneath 1995). These substitution distri-bution tests will probably also produce elevated false pos-itives in a PT model but have not been tested here. Thisis especially true considering the performance of Max #2

and GENECONV, as they are the most powerful and robustof the substitution distribution methods. Indeed, two tests,the Homoplasy Test and PIST, have been found to producehigh levels of false positives with extreme levels of rate vari-ation (Posada and Crandall 2001).

Performance of Linkage Disequilibrium and Distance Tests forRecombinationAlthough LDr 2 is a more powerful test than LD|D !| to de-tect recombinationwhen it is present (White and Gemmell2009), LDr 2 found many more false positives than LD|D !|,suggesting that the gain comes with a cost in accuracy.The null hypothesis that without recombination, linkagedisequilibriumdoes not correlatewith distance fails becauseclusters of unusually fast or unusually slow-evolving sites(relative to the rest of the alignment) can produce anegative correlation between linkage disequilibrium anddistance (Innan and Nordborg 2002). The major differencebetween LD|D !| and LDr 2 is that LD|D !| can only mea-sure linkage disequilibrium when all four genotypes arepresent (the two parentals and two recombinants), andwhen allele frequencies are moderate to high (Awadallaet al. 2000). It is not surprising then, that LDr 2 has a higherlevel of false positives than LD|D !|, although LD|D !|will stillproduce an elevated level of false positives under certainconditions.

Performance of Phylogenetic Compatibility Tests forRecombinationReticulate and PHI are compatibility methods and of allthe tests evaluated here, they are the most robust to mu-tation cold spots (but still produced up to 30% false posi-tives). The improved false positive rate of the most robusttest, PHI, comes at a price. PHI tends to be overly conser-vative when there are too few informative sites or too few

2556

Sun et al. · doi:10.1093/molbev/msr076 MBE

incompatibilities; this tends to occur when alignments havefewer than 15 sequences or when nucleotide diversity isbelow 5% (Bruen et al. 2006).

PT Versus Mutation Rate HeterogeneityAlthough PT and rate heterogeneity share similarities,they are not the same phenomena. Rate heterogeneity, ascommonly applied, refers to variation in substitution ratesbetween sites across all sequences (i.e., n = N ), whereas PTrefers to variation in substitution rates between subsets ofsites and in a subset of sequences (i.e.,n $= N ). PT is capableof producing an elevated level of false positives which can-not be accounted for by rate heterogeneity alone (table 3aand 3b). Likewise, rate heterogeneity produces an elevatedlevel of false positives which cannot be accounted for by PTalone (table 3c).

In summary, LD|D !|, Reticulate, and PHI are com-paratively less susceptible to false positives by PT thanGENECONV, Max #2, and LDr 2. When LD|D !|, Reticulate,and PHI do falsely infer recombination, it is probably be-cause sites have mutated multiple times in such a way thatall four genotypes are present at a site (LD|D !|) and/orinformative sites outside the cold region appear phylo-genetically incompatible with sites inside the cold region(Reticulate and PHI). This would explain why simulationswith a small cold clade but a large cold region produced themost false positives for LD|D !|, Reticulate, and PHI; a largecold region forces the few remaining noncold sites to ac-cept more mutations per site to maintain the same over-all rate of evolution, but if too many clades contain thecold region, there will be too few informative sites to detectany signal.

Doubling the number of sequences in the data set from15 to 30 did not substantially improve the performance ofMax #2, GENECONV, LD|D !|, or Reticulate but improvedthe performance of LDr 2 and PHI (fig. 4). This contradicts aprevious study, where an increase in number of sequencesfrom 10 to 50 led to an increase in level of false positivesdetected from 10% to over 50% in PHI, Reticulate, andMax#2 (Bruen et al. 2006). This could be due to additional fac-tors included in the previous model, including exponentialgrowth, extreme site-specific rate heterogeneity, and/or themethod of simulating mutation hot spots, which would beshared across all sequences.

PT in Animal mtDNAUsing simulations that can detect PT in either closelyrelated or nonclosely related lineages, we explored thepossibility that PT exists in animal mtDNA that waspreviously reported to carry a signature of recombination.We asked, how prevalent PT is in mtDNA that have had re-combination reported and if that degree of PT is sufficientto generate false positives for recombination. PT was de-tected in 85% of the animal mtDNA data sets and of these,all had an elevated level of false positives for recombination(table 5). The three data sets that did not havedetectable PTwere Mandrillus sphinx, Dendroica petechia, andMacrodonancylodon.

Ninety-five percent of the animal mtDNA data sets withpreviously reported recombination favored a partitionedmodel over the nonpartitionedAAA model (table 4). With-out exception, tests produced more recombination falsepositives when the data were partitioned than when onlyone set of parameters and branch lengths was provided forthe whole sequence. Admittedly, our method of partition-ing is a crude manner to detect PT when modeling empiri-cal mtDNA data sets. Nonetheless, this relatively coarse PTmodel fitsmany of these data sets significantlybetter than amodel without PT (see below). An interesting direction forfuture work would be the development of a more generalPT model where the sample size and length of the PT regionare estimated from the data.

In Piganeau et al. (2004)’s survey of recombinationin 267 animal mtDNA, only four data sets were sta-tistically significant—Bursaphelenchus conicaudatus,Macaca nemestrina,Microtus longicaudus, andMicropterussalmoides. We found that, under a PT model, these four“most probable” recombining data sets were consistentlyamong those producing the highest level of false positivesacross numerous tests (table 5).

Effect of Recombination on the Incidence of PTThese simulations offer an opportunity to compare the levelof PT detected in animal mtDNA to the level expected dueto recombination. Interestingly, recombination is largelyun-detected in simulations with recombination that were sim-ulated using a high autosomal estimate of c and even anextreme artificially high c (table 6). This may be becauseonly Reticulate, PHI, and Max #2 were used to test the re-combined simulations. Nevertheless, when we compare therecombined simulations to the animal mtDNA using Retic-ulate, PHI, andMax#2 only, we can compare the level of PTobserved in the animalmtDNA to the expected level due torecombination alone.

Of the 8% of autosomal level recombination simulationswith detectable recombination, 25% of these possess PT(table 6). Of the 27% of extreme c simulations with de-tectable recombination, about half (52%) have PT. In con-trast, in the animal mtDNA data sets, 80% of simulationshave detectable recombination (using Reticulate, PHI, orMax #2 only), and of these, 94% have PT. This suggests thateither the PTdetected in the animalmtDNA is due to a phe-nomenally high recombination rate, much higher than theextreme rate tested here, or, that the excess 42–69% of PTin animal mtDNA cannot be explained by any biologicallyfeasible recombination rate.

ConclusionThe possibility of widespread recombination in animalmtDNA caught immediate attention as it implied impor-tant consequences for phylogenetic and populationstudies.We show here that a specific type of mutation heterotachy,whichwe call PT, is capable of producing extremely high lev-els of false positives for recombination in indirect tests. Wepresent a method of measuring PT, and using this method

2557

“Patchy-Tachy” Leads to False Positives for Recombination · doi:10.1093/molbev/msr076 MBE

on previously reported recombining animal mtDNA, showthat the level of PT present will produce elevated levelsof false positives in almost all mtDNA data sets tested.Finally, we also demonstrate that the level of PT measuredin the animal mtDNA cannot be explained by recombina-tion alone. These results do not refute the possibility thatrecombination can and does occur in some animalmtDNA.Rather, it casts doubt on the ability of indirect recom-bination tests to distinguish between recombination andPT. This finding is consistent with previous studies whichsuggest that heterotachy is widespread in the mitochon-drial genome; at least 28–95% polymorphic sites are hetero-tacheous in mitochondrial coding regions, and the positionof heterotacheous sites does not appear to be tied to func-tional divergence nor to spatial structure of the protein(Lopez et al. 2002).

The indirect tests evaluated in this study have been usedin studies screening a much wider range of animal mtDNAthanwe studied here (Tsaousis et al. 2005; Ujvari et al. 2007).It seems probable that PT in these other data sets could alsoproduce elevated false positives. Other indirect tests thathave been used to detect recombination do not consider PTand could produce as many or more false positives as havebeen reported here.

Supplementary MaterialSupplementary figures, and tables are available atMolecular Biology and Evolution online (http:// www.mbe.oxfordjournals.org/).

AcknowledgmentsThe authors thank Dr Gwenael Piganeau who generouslyshared her LDr 2, LD|D !|, and Max #2scripts and providedassistance in using them. Anonymous reviewers put a greatdeal of effort in improving this manuscript and we sincerelythank them for their efforts. This work was made possi-ble by the facilities of the Shared Hierarchical AcademicResearch Computing Network (www.sharcnet.ca) and wassupported by Natural Sciences and Engineering ResearchCouncil discovery grants to G.B.G. and B.J.E.

ReferencesAkaike H. 1974. A new look at the statisticalmodel identification. IEEE

Trans Automat Control. 19:716–723.Arenas M, Posada D. 2007. Recodon: coalescent simulation of coding

DNA sequences with recombination, migration and demography.BMC Bioinformatics. 8:458.

Awadalla A, Eyre-Walker A, Maynard Smith J. 2000. Questioning ev-idence for recombination in human mitochondrial DNA. Science288:1931a.

Awadalla P, Eyre-Walker A, Maynard-Smith J. 1999. Linkage disequilib-rium and recombination in hominid mitochondrial DNA. Science286:2524–2525.

Becquet C, Przeworski M. 2007. A new approach to estimate param-eters of speciation models with application to apes. Genome Res.17:1505–1519.

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: apractical and powerful approach tomultiple testing. J Roy Stat Soc.57:289–300.

Bereiter-Hahn J, Voth M. 1994. Dynamics of mitochondria in livingcells: shape changes, dislocations, fusion, and fission of mitochon-dria. Microsc Res Tech. 27:198–219.

Birky CWJ. 2001. The inheritance of genes in mitochondria andchloroplasts: laws, mechanisms, and models. Annu Rev Genet. 35:125–148.

Bruen TC, Philippe H, Bryant D. 2006. A simple and robust statis-tical test for detecting the presence of recombination. Genetics172:2665–2681.

Chen J, Ferec C, Cooper D. 2009. Closely spaced multiple mutations aspotential signatures of transient hypermutability in human genes.Hum Mutat. 30:1435–1448.

Ciborowski KL, Consuegra S, Garcıa de Leaniz C, Beaumont MA,WangJ, Jordan WC. 2007. Rare and fleeting: an example of interspecificrecombination in animal mitochondrial DNA. Biol Lett. 3:554–557.

Drake J. 2007. Too many mutants with multiple mutations. Crit RevBiochem Mol Biol. 42:247–258.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high ac-curacy and high throughput. Nucleic Acids Res. 32:1792–1797.

Evans BJ, Morales JC, Supriatna J, Melnick DJ. 1999, Origin of theSulawesi macaques (Cercopithecidae: Macaca) as suggested bymitochondrial DNA phylogeny. Biol J Linn Soc. 66:539–560.

Fitzgerald J, Dahl H-HM, Jakobsen IB, Easteal S. 1996. Evolution ofmammalian X-linked and autosomal Pgk and Pdh E1" subunitgenes.Mol Biol Evol. 13:1023–1031.

Galtier N, Enard D, Radondy Y, Bazin E, Belkhir K. 2006. Mutation hotspots in mammalian mitochondrial DNA. Genome Res. 16:1–8.

Gantenbein B, Fet V, Gantenbein-Ritter IA, Balloux F. 2005. Evidencefor recombination in scorpion mitochondrial DNA (Scorpiones:Buthidae). Proc R Soc B. 272:697–704.

Hill WG, Robertson A. 1968. Linkage disequilibrium in finite popula-tions. Theor Appl Genet. 38:226–231.

Hudson RR. 2002. Generating samples under a Wright-Fisher neutralmodel of genetic variation. Bioinformatics 18:337–338.

Huelsenbeck JP, Ronquist F. 2001. MRBAYES: Bayesian inference ofphylogeny. Bioinformatics 17:754–755.

Innan H, Nordborg M. 2002. Recombination or mutational hot spotsin human mtDNA?Mol Biol Evol. 19:1122–1127.

Jakobsen IB, Easteal S. 1996. A program for calculating and displayingcompatibility matrices as an aid in determining reticulate evolu-tion in molecular sequences. CABIOS 12:291–295.

Kraytsberg V, Schwartz M, Brown TA, Ebralidse K, Kunz WS, ClaytonDA, Vissing JKK. 2004. Recombination of human mitochondrialDNA. Science 304:981.

Ladoukakis ED, Zouros E. 2001. Direct evidence for homologous re-combination in mussel Mytilus galloprovincialis mitochondrialDNA.Mol Biol Evol. 18:1168–1175.

Lawson MJ, Zhang L. 2009. Sexy gene conversions: locating gene con-versions on the X-chromosome. Nucleic Acids Res. 37:4570–4579.

Lewontin RC. 1964. The interaction of selection and linkage. Genetics49:49–67.

Lopez P, Casane D, Philippe H. 2002. Heterotachy, an important pro-cess of protein evolution.Mol Biol Evol. 19:1–7.

LynchM. 2007. The origins of genome architecture. Sunderland (MA):Sinauer.

Maynard Smith J. 1992. Analyzing the mosaic structure of genes. J MolEvol. 34:126–129.

Maynard Smith J, Smith NH. 1998. Detecting recombination fromgene trees.Mol Biol Evol. 15:590–599.

Maynard Smith J, Smith NH. 1999. Recombination in animal mito-chondrial DNA.Mol Biol Evol. 19:2330–2332.

Neiman M, Taylor DR. 2009, The causes of mutation accumulation inmitochondrial genomes. Proc R Soc B. 276:1201–1209.

Pesole G, Gissi C, DeChirico A, Saccone C. 1999. Nucleotide substi-tution rate of mammalian mitochondrial genomes. J Mol Evol.48:427–434.

2558

Sun et al. · doi:10.1093/molbev/msr076 MBE

Piganeau G, Gardner M, Eyre-Walker A. 2004. A broad survey of re-combination in animalmitochondria.Mol Biol Evol. 21:2319–2325.

Posada D, Crandall KA. 2001. Evaluation of methods for detecting re-combination from DNA sequences: computer simulations. ProcNatl Acad Sci U S A. 98:13757–13762.

Ptak SE, Voelpel K, Przeworski M. 2004. Insights into recombina-tion from patterns of linkage disequilibrium in humans. Genetics167:387–397.

Rambaut A, Grassly NC. 1997. Seq-Gen: an application for the MonteCarlo simulation of DNA sequence evolution along phylogenetictrees. Comput Appl Biosci. 13:235–238.

Ronquist F, Huelsenbeck JP. 2003. MRBAYES 3: Bayesian phylogeneticinference under mixed models. Bioinformatics 19:1572–1574.

Sawyer SA. 1989. Statistical tests for detecting gene conversion. MolBiol Evol. 6:526–538.

Sawyer SA. 1999. GENECONV: a computer package for the statisti-cal detection of gene conversion. Distributed by the author. St.Louis (MO): Department ofMathematics, WashingtonUniversity.Available from: http://www.math.wustl.edu/˜sawyer.

Sneath PHA. 1995. The distribution of the random division of amolec-ular sequence. Binary 7:148–152.

Takahata N. 1994. Comments on the detection of reciprocal recombi-nation or gene conversion. Immunogenetics 39:146–149.

Thyagarajan B, Padua RA, Campbell C. 1996. Mammalian mitochon-dria possess homologous DNA recombination activity. J Biol Chem.271:27536–27543.

Tsaousis AD, Martin DP, Ladoukakis ED, Posada D, Zouros E. 2005.Widespread recombination in published animal mtDNA se-quences.Mol Biol Evol. 22:925–933.

Ujvari B, Dowton M, Madsen T. 2007. Mitochondrial DNA re-combination in a free-franging Australian lizard. Biol Lett. 3:189–192.

Wagenmakers EJ, Farrell S. 2004. AIC model selection using Akaikeweights. Psychonomic Bull Rev. 11:192–196.

Wang J, Gonzalez K, Scaringe W, Tsai K, Liu N, Gu D, Li W, Hill K, Som-mer S. 2007. Evidence for mutation showers. Proc Natl Acad Sci US A. 104:8403–8408.

White DJ, Gemmell NJ. 2009. Can indirect tests detect a knownrecombination event in human mtDNA? Mol Biol Evol. 26:1435–1439.

Wilson EB. 1916. The distribution of the chondriosomes to the sper-matozoa in scorpions. Proc Natl Acad Sci U S A. 2:321–324.

Wiuf C, Christensen T, Hein J. 2001. A simulation study of thereliability of recombination detection methods. Mol Biol Evol. 18:1929–1939.

Worobey M. 2001. A novel approach to detecting and measuring re-combination: new insights into evolution in viruses, bacteria, andmitochondria.Mol Biol Evol. 18:1425–1434.

Yang Z. 1997. PAML: a program package for phylogenetic analysis bymaximum likelihood. Comput Appl BioSci. 13:555–556.

Yang Z. 2007. PAML 4: a program package for phylogenetic analysis bymaximum likelihood.Mol Biol Evol. 24:1586–1591.

2559

Supplementary Material

N=

15;L

=1200bp;p=middle

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS1:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

N=

30;L

=1200bp;p=middle

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=2 3

N,l

=1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS2:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

N=

15;L

=1200bp;p=side

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS3:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

N=

30;L

=1200bp;p=side

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS4:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

N=

15;L

=600bp;

p=middle

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS5:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

N=

30;L

=600bp;

p=middle

RET

PHI

GCG

GCI

MX

LDR

LDD

(n=

1 3N

,l=

1 3L)

020406080100

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

1 3N

,l=

2 3L)

Coldnessfactor

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.

050.

200.

350.

50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

1 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

(n=

2 3N

,l=

2 3L)

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

0.05

0.20

0.35

0.50

0.05

0.20

0.50

1.00

Proportionofsegregatingsites

FigureS6:FalsepositivesinPatchy-tachysim

ulations.N

=sequences/simulation,L=lengthofsequences,p=positionof

coldspotincoldsequences,n=numberofsequenceincoldclade,l=lengthofcoldspot.Blackcirclesrepresent800-1000

simulationswhichweretestedforrecombination,andgreycirclesmarkdatacollectedfromfewerthan800sim

ulations.

TableS1:Descriptionofdatasets.Poly=polymorphicsitesoutof(#siteswithoutgapsor

ambiguousnuc).

Dataset

#sites

#samples

%poly(#sites)

Accessionnumbers

SulawesimacaquesND3-ND4-ND4L

1203

1824.63(1202)

AF091400-2013AF091429

BursaphelenchusconicaudatusCOI

960

3020.21(960)

AB083711-39

MicropterussalmoidesCYTB

1140

149.39(1140)

AY115999;AY116000;AY225669-84

MacacanemestrinaCYTB

859

1117.11(859)

AF350388-91;AF350394-99

MicrotuslongicaudusCYTB

1140

6818.86(1140)

AF187160-230

VesicomyapacificaCOI

516

2214.76(515)

AF008287;AF008293-5;AF143290-304

MandrillussphinxCYTB

267

715.93(253)

AF020423;AF301612-16;AY204763-827

LibellulaquadrimaculataCOI

416

158.79(387)

AF228584-98

DendroicapetechiaATP8-ATP6

842

111.66(841)

AF382957;AY115297-306Y

ApodemussylvaticusCYTB

974

7519.31(875)

AF159395;

AF60603;

ASY98598;

ASY98600;

ASY98605;

ASY311148;ASY511877-9;ASY511883-

5;ASY511887;

ASY511889-91;

ASY511896-7;

ASY511899;ASY511901;ASY511903-4;ASY511906-

8;ASY511910-2;

ASY511914-24;

ASY511928-32;

ASY511935-41;

ASY511943-4;

ASY511946-69;

ASY511971-2

GomphiocephalushodgsoniCOI

599

453.51(598)

AY294562-606

AlpheuslottiniCO1

564

4217.05(563)

AF107049-68;AF309910;ALU76428-489

MacrodonancylodonCYTB

810

465.24(801)

AY253604-9;AY253611-23;AY253625-32;AY253634-7;

AY253639-50;AY253652;AY253655-6D

PapiopapioND4-ND5

696

82.16(695)

AY212049-56

CampylorhynchusbrunneicapND2

298

608.90(236)

AF291512-71

BradypodionoccidentaleND2

987

82.73(951)

AF448728;AY289868;AY289888;AY289907-11

PasserellailiacaCYTB

432

93.57(392)

U40162-70

MerlangiusmerlangusATP6

878

131.59(756)

AF526616-28

GonatusonyxCOI

657

71.98(657)

AF000041;AF144718-23

GrusantigoneCYTB

1143

91.49(657)

U43618-25;U11060-1;U11064