AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

download AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

of 11

Transcript of AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    1/11

    Am. J. Hum. Genet. 61:14131423, 1997

    1413

    Diversity and Age of the Four Major mtDNA Haplogroups, and TheirImplications for the Peopling of the New World

    Sandro L. Bonatto and Francisco M. Salzano

    Departamento de Genetica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

    Summary

    Despite considerable investigation, two main questionson the origin of Native Americans remain the topic ofintense debatenamely, the number and time of the mi-gration(s) into the Americas. Using the 720 availableAmerindian mtDNA control-region sequences, we rean-alyzed the nucleotide diversity found within each of thefour major mtDNA haplogroups (AD) thought to have

    been present in the colonization of the New World. Wefirst verified whether the within-haplogroup sequencediversity could be used as a measure of the haplogroupsage. The pattern of shared polymorphism, the mismatchdistribution, the phylogenetic trees, the value of TajimasD, and the computer simulations all suggested that thefour haplogroups underwent a bottleneck followed bya large population expansion. The four haplogroup di-versities were very similar to each other, offering a strongsupport for their single origin. They suggested that the

    beginning of the Native Amer-icans ancestral-population differentiation occurred

    30,00040,000 years before the present (ybp), with a95%-confidence-interval lower bound of25,000 ybp.These values are in good agreement with the NewWorldsettlement model that we have presented else-where, extending the results initially found for haplo-group A to the three other major groups of mtDNAsequences found in the Americas. These results put thepeopling of the Americas clearly in an early, pre-Clovistime frame.

    Received March 4, 1997; accepted for publication October 1, 1997;electronically published November 26, 1997.

    Address for correspondence and reprints: Dr. Sandro L. Bonatto,Departamento de Genetica, Universidade Federal do Rio Grande doSul, Caixa Postal 15053, 91501-970 Porto Alegre, RS, Brazil. E-mail:[email protected] 1997 by The American Society of Human Genetics. All rights reserved.

    0002-9297/97/6106-0027$02.00

    Introduction

    The question of the origin of the indigenous peoples ofthe Americas has been the object of great debate. Somemajor problems have been slowly resolved since the lastcentury, the main agreement achieved so far being thatconcerning these peoples origin by migrations from Asiathrough the region of the Bering Strait x12,000 yearsbefore the present (ybp) (Cavalli-Sforza et al. 1994).However, the time and number of such migrations, aswell as the size of the ancestral populations, are stillimportant unsettled questions.

    Previous studies based on high-resolution RFLPs andcontrol region (CR) sequences have shown that the greatmajority of the Native American mtDNAs screened sofar could be classified into four distinct clusters, calledhaplogroups AD (for a review of the RFLP data,see Wallace 1995; for the CR sequence data, see Forsteret al. 1996). The distribution of these four haplogroupsin the populations that spoke the three main sets oflanguages found in the Americas (Amerind, Na-Dene,

    and Eskaleut), as well as the estimates of the internaldiversity of each haplogroup, led to several hypothesesregarding the number and age of the migration(s) thatcolonized the New World. Although Amerind popula-tions in general have all four haplogroups, Na-Dene andEskimo groups have mainly sequences from haplogroupA (Merriwether et al. 1995; Wallace 1995). Moreover,using RFLP data, Torroni et al. (1992, 1993, 1994)found a much lower haplogroup A sequence diversityin the Na-Dene than in the Amerinds, whereas, withinAmerinds, haplogroup B had sequence diversity lowerthan those of the other three haplogroups. These resultsled these authors to suggest that the Na-Dene entered

    the continent by means of an independent migration(Wallace 1995). The Amerinds, on the other hand,would have migrated to the Americas in two waves; themore ancient carried haplogroups A, C, and D, whereasthe more recent carried haplogroup B sequences only.On the basis of the mean diversity found in the haplo-groups, Wallace and co-workers dated the major Am-erind migration into the Americas as having occurred26,00034,000 ybp, the haplogroup B migration ashaving occurred 12,00015,000 ybp, and the Na-Dene

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    2/11

    1414 Am. J. Hum. Genet. 61:14131423, 1997

    migration as having occurred 7,0009,000 ybp (Wal-lace 1995). In contrast, Horai et al. (1993), using CRsequence data, postulated that each major haplogroupwould represent separate migrations that occurred14,00021,000 ybp.

    We have recently shown (Bonatto and Salzano 1997),

    on the other hand, that mtDNA sequence data stronglysupport a single and early (120,000 ybp) origin for theAmerinds, Na-Dene, and Eskimo, in agreement withother molecular studies (Merriwether et al. 1995; Forsteret al. 1996; Kolman et al. 1996). Our results were basedmainly on the analysis of the CR sequences from hap-logroup A, since it is the only haplogroup widely dis-tributed among all Native Americans (Merriwether etal. 1995). It remained unknown whether the other threemajor Native American haplogroups would indicate thesame picture. Also, although most studies on the prob-lem of dating the colonization of the Americas have usedsequence diversity as a measure of age, few (e.g.,Bonatto

    and Salzano 1997) have investigated whether their sam-ples met the very stringent assumptions required by thispractice (Rogers and Jorde 1995).

    The goals of the present study were to analyze thefour major Native American mtDNA haplogroups di-versity and to evaluate the implications of these resultsfor the estimation of the number and age of the migra-tion(s) to the Americas. Specifically, we tested the hy-pothesis that one or more of the haplogroups may rep-resent different migrations to the continent (e.g., Horaiet al. 1993; Wallace 1995). It should be noted that herewe will not test the hypothesis of different migrationsthat is based on linguistic groups (Amerinds, Na-Dene,

    and Eskimo), since this has been done elsewhere (Bon-atto and Salzano 1997). The analyses involved the useof two data sets, one including 720 Native Americans,with their hypervariable segment I (HVS-I) sequences,and another composed of 217 individuals, with theirHVS-IHVS-II sequences. Several methods were ap-plied, including computer simulations over a wide rangeof demographic scenarios, to determine whether thedatawere consistent with a bottleneck followed by a largepopulation expansion. We finally calculated the within-haplogroup nucleotide-diversity values, and, using ap-propriate substitution rates and methods, estimatedtheirmean ages and 95% confidence interval (CI) values.

    Subjects and Methods

    Population Samples

    All available CR sequences from Native Americanswere employed, with the exception of the two popula-tions described by Horai et al. (1993), since they werenot sequenced for the first 100 bases of HVS-I. Also,only some sequences from Easton et al.s (1996) Yan-

    omami were used, since many of them present severalunusual features that preclude their utilization until theyare further investigated (authors unpublished data). Theother sequences have the complete set of nucleotides forHVS-I (positions 1602416383 [numbering is accordingto Anderson et al. 1981]) or HVS-IHVS-II (positions

    45390 for HVS-II), or only a small fraction of themare missing. Two data sets were assembled, one withHVS-I sequences and the other with HVS-IHVS-II se-quences. The sequences were aligned by hand, and in-sertions in relation to the reference sequence (Andersonet al. 1981) were not considered. For the HVS-I, theNative American sample consists of 720 individualsfrom a total of 24 populations (with sample sizes

    ) from North, Central, and South America, forn x 5each continent, as follows: for South America (n

    )Xavante ( ), Zoro ( ), and Gaviao318 n 25 n 30( ) (Ward et al. 1996); Wai Wai ( ) and Surun 27 n 26( ) (authors unpublished data); Mapuche (n 24 n

    ) (Ginther et al. 1993); Yanomama ( ), Way-39 n 27ampi ( ), Kayapo ( ), Arara ( ), Ka-n 21 n 13 n 9tuena ( ), Poturujara ( ), Awa-Guaja ( ),n 9 n 9 n 2and Tiriyo ( ) (Santos et al. 1996); Yanomamin 2( ) (Easton et al. 1996); and Colombian mummiesn 50( ) (Monsalve et al. 1996); for Central American 5( )Huetar ( ) (Santos et al. 1994); Ngoben 136 n 27( ) (Kolman et al. 1995); and Kuna ( ) (Ba-n 46 n 63tista et al. 1995); and, for North America( )Nuu-Chah-Nulth ( ) (Ward et al.n 228 n 631991); Bella Coola ( ) and Haida ( ) (Wardn 40 n 41et al. 1993); and Yakima ( ), Athapascan (n 42 n

    ), Inupiaq Eskimo ( ), and western Greenland21 n 5

    Eskimo( ) (Shieldset al.1993). The 38 individualsn 16whose mtDNA Torroni et al. (1993) have sequencedfrom several populations all over the Americas were alsoincluded.

    For HVS-IHVS-II, sequences were available from atotal of 217 individuals from the Huetar (Santos et al.1994), Ngobe (Kolman et al. 1995), Mapuche (Gintheret al. 1993), and Yanomami (Easton et al. 1996) andfrom 24 Suru, 26 Wai Wai, 3 Xavante, 1 Gaviao, and1 Zoro (authors unpublished data).

    Phylogenetic Analysis

    Several DNA distances were used in the tree construc-tions, from the simplest (proportion of differences) tothe most complex (Tamura-Nei gamma [Tamura andNei 1993]), but all gave essentially the same results;therefore, only those with the Kimura two-parameter(K2P [Kimura 1980]) distance were presented. Becauseof the large number of sequences used, trees were con-structed with the neighbor-joining (NJ) method, by useof the Njboot program (N. Takezaki; available at In-ternet address http://iubio.bio.indiana.edu). The inte-

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    3/11

    Bonatto and Salzano: Native American mtDNA Diversity 1415

    rior-branch-test confidence probability (CP) values forbranches in the trees (Rzhetsky and Nei 1992) were es-timated by the CheckSzDv program (from the TreePackpackage [I. Belyi; http://trantor.cse.psu.edu/belyi]), bymeans of the pairwise option and the K2P distance. Min-imum-spanning trees (Excoffier et al. 1992) were also

    constructed, by means of the Minspnet program (L. Ex-coffier; ftp://acasun1.unige.ch/pub/comp/win).

    Diversity and Divergence Estimates

    The nucleotide diversity within and between haplo-groups was calculated by means of the Sendbs program(N. Takezaki; http://iubio.bio.indiana.edu). SeveralDNA distances were calculated, and the standard error(SE) values of these estimates were obtained with a boot-strap approach with 1,000 replications over sites. The95% CI for the diversity and divergence values werecalculated by use of 2 SE. The 95% CI for the timeof origin (expansion) of the haplogroups, on the basisof the nucleotide diversity values, was estimated as de-scribed by Bonatto and Salzano (1997), by use of theirformula 1 (modified from Redd et al. 1995) for the cal-culation of the minimum SE of the time (TMSE) and byuse of 2 TMSE for the lower- and upper-bound values.We should note that our 95% CI considers both thenucleotide diversity and the mutation-rate errors.

    For the time estimates, we need the substitution ratesfor the HVS-I and HVS-IHVS-II regions, as well astheir SEs. For HVS-I, we used the slow and fast ratesgiven by Bonatto and Salzano (1997): 10.3%( 1.35%)/million years (Myr) and 15% ( 1.97%)/

    Myr. For the HVS-I

    HVS-II data sets, we used the fol-lowing two rates: 8.85% ( 0.9%)/Myr and 11.5%( 1.15%)/Myr. Both slow rates were taken from Horaiet al. (1995), and the fast rates were taken from Wardet al. (1991) (in the case of HVS-I) and Stoneking et al.(1992) (in the case of HVS-IHVS-II), whereas the SEswere those either given by Horai et al. (1995) or esti-mated by use of their approach.

    The a parameter for our data sets was calculated bymeans of Yang and Kumars (1996) method and thePamp program (from the Paml package [Z. Yang; http://iubio.bio.indiana.edu]), by use of trees calculated byuse of the K2P distance.

    Mismatch Distributions

    The evolutionary history of the four haplogroups wasalso examined, by use of the mismatch-distribution ap-proach (Rogers and Harpending 1992; Rogers 1995;Rogers and Jorde 1995). The relevant parameters werecalculated by means of the method of moments (Rogers1995), by the Mmest program (from the Mismatch pack-age [A. Rogers; ftp://anthro.utah.edu]). The 95% CI forthe times of expansion of each haplogroup was estimated

    in a manner similar to that used in the nucleotide-di-versity approach presented above, as described else-where (Bonatto and Salzano 1997).

    Simulations

    Rogers and Jorde (1995) showed that the only sensein which sequence diversity can be constructed as ameasure of age is as an estimation of the time duringwhich a population has expanded since a severe bottle-neck. Although it is clear that the peopling of the Amer-icas was probably characterized by a population reduc-tion followed by expansion (Bonatto and Salzano 1997),there is considerable uncertainty about the sizes of thefounding and the more recent pre-Columbian popula-tions (Cavalli-Sforza et al. 1994). Besides, we are dealinghere with groups of sequences (haplogroups), not withdistinct populations. Thus, we want to test also the as-

    sumptions of a small founding population and a largeexpansion, for each haplogroup independently, so thatwe may apply dating methods that use sequence diversityas a measure of age.

    Following the work by Eller and Harpending (1996),we designed simulations of various situations of sta-tionary and expanding populations, to test (1) in whichconditions the empirical estimates would be reproducedand (2) whether we could reject or accept the hypothesisthat the haplogroups were stationary or had expandedand, if the latter was true, to what degree. Specifically,we tested in which demographic scenarios the simula-tions would give values at least as extreme as the ones

    estimated directly from the samples, for two statis-ticsHarpendings raggedness (r; Harpending 1994)and Tajimas D (Tajima 1989). Raggedness quantifiesthe smoothness of a distribution: the smaller the value,the smoother the (mismatch) distribution. Harpendinget al. (1993) found that expanding populations showedvery small r values, since their mismatch distributionshave the shape of a smooth wave. However, Aris-Brosouand Excoffier (1996) showed that a high heterogeneityof substitutions among the sites (a lower a parameterfor the gamma distribution) may cause a stationary pop-ulation to exhibit very smooth distributions. Therefore,in such cases the results of the simulations using theraggedness of a distribution may not readily distinguishbetween stationary and expanding scenarios. Moreover,they also showed that, although (large) population ex-pansions shift Tajimas D to (significant) negative values,substitution-rate heterogeneity has the opposite effect,moving Tajimas D to more-positive values for more-uneven substitution rates. Since the mtDNA CR in hu-mans is known to have substitution-rate heterogeneity(Kocher and Wilson 1991; Wakeley 1993), Tajimas Dmay be a better statistic to distinguish between station-

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    4/11

    1416 Am. J. Hum. Genet. 61:14131423, 1997

    ary and expanding populations than is the r used byEller and Harpending (1996) in their simulations.

    The simulations were performed by the Mmgen pro-gram (from the Mismatch package; see above), whichuses the coalescent model to generate simulated historiesby assuming some input parameters, such as size of the

    mismatch distribution, number of sites, sample size, andtime since the expansion (for details of the coalescentalgorithm, see Rogers et al. 1996). All the above inputparameters were calculated from the empirical data foreach haplogroup, so that the simulations mirrored asclosely as possible the actual demographic parametersfor each haplogroup. The other parameters of interestare the degree of expansion of the population and itsfinal size, in units ofv, where , with Nfdenotingv 2N ufthe number of females and with u denoting the per-generation mutation rate for the nucleotide region (seeRogers and Harpending 1992).

    We modified the program so that it generated empir-

    ical distributions of 10,000 D and r values for eachcombination of final v and degree of expansion. Final vranged from 0.1 to 1,000, and degree of expansionranged from 1 (for a stationary population) to100,000,000. Most of the simulations were performedconsidering only one final random mating population,but, to test whether geographic population structurecould influence the results, some simulations were alsogenerated considering that, after expansion, the popu-lation would split into 3 or 20 groups. Besides the modelof infinite sites, we also did simulations by using amutation model that takes into account the mutation-rate heterogeneity in the mtDNA CR (Rogers et al.

    1996), using the finite sites with gamma-distributedrates model of substitution (Rogers et al. 1996). Thea parameters of the gamma distribution used in the sim-ulations were those calculated for our data sets, as de-scribe above. As in the work of Eller and Harpending(1996), a specific scenario of final v and degree of ex-pansion was rejected if X500 (X5%) simulationsshowed a D or r value more extreme than that calculatedfrom the data.

    Results

    Of the 720 Native American individuals sequencedfor HVS-I, 592 (82%, comprising 125 different se-quences) have sequences with all the markers for one ofthe four haplogroups (the marker substitutions for fourmajor haplogroups are those listed by Forster et al. 1996as the founding sequences A2, B, C, and D1). For HVS-IHVS-II, this value is 161/217 (74%, comprising 52different sequences). If we exclude Easton et al.s (1996)Yanomami sample, these values are 87% for HVS-I and89% for HVS-IHVS-II, respectively. To minimize thepossibility of the occurrence of multiple, yet closely re-

    lated founding sequences in each haplogroup, whichwould result in overestimation of the diversity valuessince colonization, we used for each haplogroup onlythe sequences that have all its marker substitutions. Bydoing this we tried to ensure that all sequences analyzedhere were derived from just one founding sequence per

    haplogroup. Also, we used for haplogroup A sequencesfrom Amerinds, Na-Dene, and Eskimo, since we haveshown elsewhere (Bonatto and Salzano 1997) that theyall have a common origin. However, it is important tonote that the diversity values for haplogroup A (see be-low) did not change much if we remove the non-Amerindsequences.

    A striking feature of the two data sets is that, for eachhaplogroup, the polymorphisms, especially at HVS-I, ei-ther exist in only one sequence or are shared by a smallfraction of the sequences, with no substitution occurringin120% of the haplogroup sequences (not shown). Ac-cording to Slatkin and Hudson (1991), this pattern is

    exactly what we would expect in a situation of expo-nential growth from a single ancestral sequence (fromwhom the descendant sequences inherited the markersubstitutions).

    Haplogroups Nucleotide Diversity

    One important requirement in the coalescence theory(Donnelly and Tavare 1995) and the mismatch-distri-bution methods (Rogers and Jorde 1995) is the use ofa random sample of genes from the population understudy. The population for the problem in which we areinterestedthe peopling of the Americasis the entirety

    of Native Americans, not the local groups. However, twoindividuals from the same local population will have amuch higher probability of being closely related thanwill two individuals from different populations, espe-cially if we consider the generally small sizes of the Na-tive American local populations (Salzano and Callegari-

    Jacques 1988). The use of the within-local-populationfrequency of the sequences, highly affected by each pop-ulations specific recent demographic history, will un-derestimate the nucleotide diversity of Native Americansas a whole. On the other hand, the occurrence of thesame sequence in different populations is more likely tohave been affected by more-ancient events. Therefore,since we are interested in the early evolutionary historyof the continent, for the estimation of the within-hap-logroup nucleotide diversity we used the between-pop-ulations frequency of the sequences (also see Bonattoand Salzano 1997). To estimate the between-populationsfrequency of each different sequence in the data set, wecounted only the number of populations in which it oc-curred, disregarding its frequency within thepopulations.

    Table 1 shows the nucleotide diversities (with their

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    5/11

    Bonatto and Salzano: Native American mtDNA Diversity 1417

    Table 1

    Nucleotide Diversity and Age Estimates for Native Americans mtDNA Haplogroups

    HAPLOGROUPNO. OF

    INDIVIDUALSNO. OF

    SEQUENCES

    NUCLEOTIDEDIVERSITYa

    (95% CI)(%)

    MEAN AGE (95% CI)(years)

    10.3%/Myr 15%/Myr

    HVS-I:A 71 45 .84 (.72.97) 41,014

    (34,88747,142)28,163

    (23,94932,377)B 45 31 .80 (.71.89) 39,017

    (36,24044,569)26,791

    (22,97230,611)C 36 25 .84 (.68.99) 40,680

    (34,15047,210)27,933

    (23,44332,423)D 36 24 .96 (.771.16) 46,778

    (38,97954,576)32,121

    (26,75937,482)Averageb 188 125 .86 (.79.93)c 41,576

    (35,86947,283)28,549

    (24,62332,475)Divergenced 2.67 (2.422.92)c 129,469

    (111,458147,480)88,902

    (76,513101,292)

    8.85%/Myr 11.5%/Myr

    HVS-IHVS-II:A 17 17 .83 (.76.89) 46,635

    (41,50451,765)35,889

    (31,94039,837)B 16 15 .71 (.62.80) 40,144

    (35,36044,929)30,894

    (27,21234,576)C 11 10 .64 (.45.82) 36,034

    (29,61542,453)27,730

    (22,79032,670)D 10 10 .83 (.74.92) 46,999

    (41,61052,388)36,169

    (32,02240,316)Averageb 54 52 .75 (.70.81) c 42,620

    (38,02947,211)32,799

    (29,26636,332)Divergenced 2.03 (1.922.13)c 114,639

    (102,604126,674)88,222

    (78,96197,484)

    a Tamura-Nei gamma distance, for a values given in the text.b Weighted by the number of sequences in each haplogroup.

    c SEs for the weighted averages were calculated as the square root of the sum of the squared weighted SEs from theindividual comparisons.d Weighted average of the pairwise haplogroup divergence.

    95% CIs) for the four haplogroups, for both HVS-I andHVS-IHVS-II. The remarkable feature is the high sim-ilarity of the values in each data set, especially for HVS-I (with higher sample sizes), which have a range of0.80%0.96% with a mean of 0.86%. Note that, onthe contrary, the studies with RFLPs (Torroni et al. 1994)found that haplogroup B had a much lower diversitythan the other three. For the RFLP data the mean di-versity value of the other three haplogroups is 2.2 timeshigher than the haplogroup B diversity, whereas this ra-tio for the CR data is only 1.1. This ratio for the CRdata is maintained even when the within-population fre-quency of the sequences is used, the sample size being563 individuals in this case (not shown), which is higherthan the 335 individuals used in the RFLP studies. There-fore, this difference between CR sequences and RFLPdata cannot be either explained by sample size or at-tributed to the different ways in which the haplotypefrequencies were treated, more probably being due to

    the different populations, regions of the mtDNA studied,or haplogroup definitions. Diversity values for the HVS-IHVS-II data set were more variable, possibly becauseof smaller sample sizes and, perhaps, the higher muta-tion-rate heterogeneity in HVS-II (see below).

    The between-haplogroup divergence values are muchhigher than the within-haplogroups diversity, and theaverage among all pairwise comparisons is more thanthree times (for the HVS-I) and more than two times(for the HVS-IHVS-II) higher than the within-haplo-group averages (table 1). This result supports the notionthat the haplogroups divergence (not diversification) be-gan well before their entering the Americas and that anyanalysis that lumps together different haplogroups (e.g.,the mismatch distributions of Horai et al. 1993 and Aris-Brosou and Excoffier 1996) will furnish results fromevents that occurred much earlier than the colonizationof the New World.

    In light of both the existence of rate heterogeneity in

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    6/11

    1418 Am. J. Hum. Genet. 61:14131423, 1997

    Table 2

    Summary Statistics for the Four Native American mtDNA CRHaplogroups, Based on the HVS-I Data Set

    HAPLOGROUPTAJIMAS

    D r v0a

    Nf0b

    10.3%/Myr 15%/Myr

    A 2.363* .0496 .25 135 93B 2.359* .0414 .3 162 111C 2.226* .0498 1.0 539 370D 2.060* .0294 2.0 1,078 740

    NOTE.Number of individuals and number of sequences are as intable 1.a Maximum value of v not rejected by the simulations.a Effective number of females in the initial population, calculated

    as v0/2u.* .P ! .05

    the mtDNA CR (Wakeley 1993) and its effects on theestimation of the true DNA distance (Yang 1996), thediversity values were calculated by means of the Tamura-Nei gamma distance. The a parameter used in the cal-culations was estimated the Yang and Kumar (1996) newmethod. For the Native American HVS-I and HSV-

    IHVS-II data sets, the a values were .5 and .16, re-spectively, in close approximation with those calculatedfor similar data sets (Wakeley 1993; Yang and Kumar1996). The lower number for the HVS-IHVS-II is dueto the extremely low value for the HVS-II region, whichalone has an estimated a of .07.

    Mismatch Distributions

    A one-waveshaped distribution of the number of nu-cleotide differences between all pairs of individualswithin a population, the mismatch distribution, is a sig-nature of a population expansion in the past (Rogersand Harpending 1992), and extensive simulations havecorroborated this finding (Slatkin and Hudson 1991;Rogers and Harpending 1992; Harpending et al. 1993).Figure 1 shows the mismatch distributions for the NativeAmerican haplogroups, for both the HVS-I and HVS-IHVS-II data sets. The haplogroups wave profiles areremarkably similar to each other, suggesting that thesesequences were taken from the same ancestral popula-tion, which underwent a large expansion in the past.The waves are not so similar for the HVS-IHVS-IIdata, probably because of smaller sample sizes and themuch higher mutation-rate heterogeneity in the HVS-IIregion. The raggedness values for the HVS-I (table 2)

    data set are very low, as is generally found in expandingpopulations (Harpending et al. 1993).Several studies have shown that the phylogeny of a

    sample of genes taken from a population that has ex-perienced a large expansion after a bottleneck has theshape of a star tree (DiRienzo and Wilson 1991; Slatkinand Hudson 1991; Rogers and Jorde 1995). Figure 2shows the NJ tree of the 125 different HVS-I sequencesfrom the four haplogroups; only two D sequences didnot cluster with the others of their respective haplo-groups. The statistical support for the haplogroups ishigh, all CP values being 185%, with the exception ofhaplogroup D, which does not have unique markers (see

    Forster et al. 1996). The most remarkable feature of thistree is that each haplogroup presents a clear star-shapedsubtree, similarly to what was found with the use of theminimum-spanning tree (not shown). This result sup-ports again the hypothesis of a large population expan-sion for Native Americans.

    Simulation Results

    Tajimas D values for the haplogroups were signifi-cantly negative for the HVS-I data set (table 2). Aris-

    Brosou and Excoffier (1996) demonstrated that a large(1100-fold) population expansion moves Tajimas D tosignificantly negative values but that mutation-rate het-erogeneity shifts it to more positive values. Therefore,the significantly negative D values obtained for the fourhaplogroups when the HVS-I data are used, despite the

    existence of a moderate mutation-rate heterogeneity( ) in this region, is a strong support for a largea .5expansion that affected all four haplogroups. However,the much higher mutation-rate heterogeneity ( )a .16in the HVS-IHVS-II data shifted the D values to num-bers inside the 95% CI, although they still are moder-ately negative, as was found by Aris-Brosou and Ex-coffier (1996) in their simulations. The much lower a(.07) for the HVS-II region alone shifted Tajimas D forhaplogroups B and C to positive values (not shown).

    Figure 3 shows the results of the simulations for eachhaplogroup when Tajimas D statistics and the finite-sites gamma-rates model are used for the simulation. The

    darker-shaded values denote specific combinations of de-gree of expansion and value for final v (population size)that could not be rejected by the simulations; that is,these are scenarios in which 1500 (15%) of the simu-lations resulted in a value ofD lower than that estimatedfrom our HVS-I data set. The minimum degree of ex-pansion not rejected by the simulations was 100-fold forhaplogroups A, B, and C and 50-fold for D. Table 2presents the maximum size, of the initial population, thatwas not rejected by the simulations (the size of the initialv was calculated by dividing the final v by the degree ofexpansion for each combination of values that were notrejected, and the maximum value for each haplogroup

    was taken). From these values of initial v the effectivenumber of females was calculated by , as givenv 2N u0 fabove. These results suggest that each Native Americanhaplogroup, as defined here, was founded by a smallnumber of females. Since the values above are the num-ber of females in the founding population that carriedeach haplogroup-founding sequence, to estimate the size

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    7/11

    Bonatto and Salzano: Native American mtDNA Diversity 1419

    Figure 1 Mismatch distributions for the four haplogroups, with

    HVS-I and HVS-I

    HVS-II data sets. Fi denotes the relative frequencyof pairs of sequences that differ by i nucleotide sites.

    Figure 2 NJ tree of 125 different Native American HVS-I se-quences from the four major haplogroups. All sequences clusteredaccording to the haplogroup (AD) to which they belong, except fortwo haplogroup D sequences. The interior-branch-test CP values forthe main clusters are shown above the branches.

    of the whole founding population for the four haplo-groups we should sum the individual values for eachhaplogroup, which results (when we use the 10.3% rate)in a maximum value of2,000 females and a foundingpopulation of!5,000 individuals. The possible existenceof other, less successful founding haplogroups (e.g., seeForster et al. 1996; Merriwether and Ferrell 1996),which may account for 10% of the mtDNA now foundin the Americas, may increase these estimates to somedegree. These figures, although approximate, suggestthat during the colonization process the ancestralpopulation was never much higher than 10,000individuals.

    The use of the finite-sites model with gamma-distrib-uted rates in the simulationsrather than theunrealistic,infinite-sites model (e.g., see Eller and Harpending1996)turned the tests more stringent in relation to thescenarios that could be rejected. Also, in distinguishingthe stationary scenario from the expanding scenario, Ta-jimas D had a discriminating power much higher than

    that of the raggedness statistics, especially when a mu-tation-rateheterogeneity model was used in the simu-lations (not shown). The use of final, geographicallystructured populations, instead of a randomly matingone, in the simulations had no qualitative effect on thetests.

    Estimating the Age of the Four Native AmericanmtDNA Haplogroups

    All the results that we have presented so far stronglyargue in favor of the hypothesis that, in the process ofthe colonization of the Americas, there was, for eachhaplogroup, a bottleneck, followed by a large (1100-fold) expansion. Therefore, we now have justification touse the sequence diversity found in each Native Amer-ican haplogroup as a measure of the latters expansionage. Table 1 shows the mean ages and their 95% CIsfor each haplogroup, for both data sets and all mutationrates. The diversification times are very similar both toeach other and between the HVS-I and HVS-IHVS-IIdata sets. The average ages for the HVS-I values were42,000 and 29,000 ybp, and those for the HVS-IHVS-II values were 43,000 and 33,000 ybp, forthe slower and faster substitution rates, respectively. The

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    8/11

    Figure 3 Simulation surfaces for the four haplogroups (AD), by use of Tajimas D. The number of simulations for which Tajimas D were mohaplogroup are plotted for each value of final v and degree of expansion. Darker shading denotes those models that could not be rejected at the 5% lthat could be rejected at the 5% level.

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    9/11

    Bonatto and Salzano: Native American mtDNA Diversity 1421

    lower-bound estimates for the averages were 25,000(HVS-I) and 29,000 (HVS-IHVS-II) ybp; and theminimum value for all estimates was 23,000 ybp, forhaplogroup C with use of the HVS-IHVS-II data. Theupper-bound age was 50,000 ybp. The average diver-gence time between the haplogroups was 1110,000 ybp,

    being, in general, three times higher than the haplogroupages. The ages estimated by use of the mismatch-distri-bution approach, by means of the method of moments,were identical or very similar to those calculated by useof nucleotide diversity (not shown), further indicating astrong initial bottleneck (Bonatto and Salzano 1997),although the CIs of the mismatch-distribution methodwere larger.

    Discussion

    When we consider mainly the HVS-I data, the patternof shared polymorphisms, the mismatch distribution (fig.

    1), the phylogenetic tree (fig. 2), the values of TajimasD and raggedness (table 2), and the simulation results(fig. 3), all suggest that the four major haplogroups un-derwent a bottleneck followed by a large populationexpansion. These results give strong support for our fur-ther use of the within-haplogroup sequence diversity asan estimate of the time since that bottleneck. The verysimilar diversity values found for the four haplogroups,both with the HVS-I data set and with the HVS-IHVS-II data, strongly suggest that they all expanded at ap-proximately the same time and, therefore, that they mostlikely came from the same population, a result that isin agreement with a single-migration model suggested

    by several recent studies (Merriwether et al. 1995; For-ster et al. 1996; Kolman et al. 1996; Bonatto and Salzano1997).

    In our previous study (Bonatto and Salzano 1997),using mainly haplogroup A sequences, we concludedthat those mtDNA data strongly indicate that all NativeAmericans originated from a single colonization eventthat occurred in Beringia 122,000 ybp ago, possibly30,00040,000 ybp. We suggested a scenario, basedon Szathmarys works (e.g., see Szathmary 1993), inwhich the Native American ancestral population settledin the Beringian landmass during sometime before ex-panding. Eventually they crossed the Alberta ice-freecorridor and colonized the rest of the American conti-nent. The collapse of that corridor, 25,00014,000(Hoffecker et al. 1993) or 30,00011,000 (Lemmen etal. 1994) ybp, isolated the people still living in Beringia,from whom originated the Na-Dene and Eskimos (withtheir reduced overall mtDNA diversity); those south ofthe ice sheets gave rise to the Amerind-speaking peoples.The present results for the four major haplogroups di-versification ages agree very well with these estimates.When only the mean values are considered, these esti-

    mates suggest a very early date (30,00040,000 ybp)for the beginning of the diversification of the NativeAmerican ancestral population, with a lower bound of25,000 ybp.

    At least two types of evidence support the idea thathaplogroups sequence differentiation probably began

    during Beringias settlement and not in Asia before thecolonization process: (1) our estimates ofx100-fold an-cient population expansion suggest that the diversifi-cation began during an intensive colonization process;and (2) if the expansion had occurred somewhere elsein Asia, then one should find there sequences, with allmarkers for each haplogroup, at a high number andfrequency, similar to the 90% frequency found in Na-tive Americans; however, only the founding sequencesfor each haplogroup have been found in Asia sofarand they have been found at a very low frequency(see Forster et al. 1996; Kolman et al. 1996; Bonattoand Salzano 1997). The few additional founding se-

    quences for haplogroup A that have been suggestedinthe Na-Dene and Eskimo (see Forster et al. 1996)areprobably derived ones and will be discussed elsewhere(authors unpublished data).

    We agree that some additional founding haplogroups(such as group X from Forster et al. 1996; also see Bail-liet et al. 1994; Merriwether and Ferrell 1996) mightexist, besides the four major ones studied here. However,they constitute only 10% of the sequences now foundin the Americas and, because of their very small samplesize, could not be analyzed in the study. Since we ana-lyzed each haplogroup separately, and since the numberof haplogroups was not a relevant parameter, including

    these putatively additional founding haplogroups shouldnot significantly change the results presented here.

    Some recent studies also tried to estimate the time ofentry into the Americas by means of haplogroup-diver-sity values, on the basis of both RFLP data (Torroni etal. 1992, 1994) and CR sequence variation (Forster etal. 1996). We emphasize that our inferred CIs took intoaccount both the mutation-rate heterogeneity and thenucleotide-diversity variance, whereas the estimates ofother recent studies considered, at most, only one sourceof error; the CI for their estimates would be muchbroader than the range that they have provided. As forTorroni et al.s (1992, 1994) hypothesis, our previousresults do not support the idea of an independent Na-Dene migration (Bonatto and Salzano 1997), and ourpresent analyses also do not support their suggestion ofa more recent haplogroup B migration. Similarly, neitherHorai et al.s (1993) proposal of different migrations,14,00021,000 ybp, for each haplogroup nor the hy-pothesis of a Polynesian contribution for haplogroup Bsequences found in America (see Bonatto et al. 1996)was supported. In any case, Torroni et al.s (1994) es-timated average arrival date, 26,00034,000 ybp, for

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    10/11

    1422 Am. J. Hum. Genet. 61:14131423, 1997

    the other three haplogroups is very close to our estimates(table 1).

    In general, Forster et al.s (1996) scenario for the peo-pling of the Americas is similar to that which we pro-posed (see above and Bonatto and Salzano 1997). Theypostulated a single and early entry (120,000 ybp) and

    suggested that, although the Amerinds colonized all thecontinent and maintained their original diversity, Ber-ingians (Eskimo Na-Dene) reduced their diversity, be-cause of the climates deterioration until 11,000 ybp,at which time they reexpanded to their present size. For-ster et al. also have presented coalescence ages for NativeAmerican haplogroups, using a data set very similar toour HVS-Ibut very different methodsto estimatethe haplogroups age. Although they did not calculateany CI for their age estimates, they suggested20,00025,000 ybp as the arrival time for the Amer-inds, which is near our lower-bound estimates. Theirhaplogroup coalescence ages, however, are probably un-

    derestimates of the diversification times since these pop-ulations entrance in the Americas, since they estimatedthe diversity values on the basis of each haplogroupwithin each tribe separately. Their results would receivea strong influence from the recent demographic historyof each tribe, which could significantly change the an-cient parameters that we are interested to estimate. Agood example of this can be seen in their estimated agefor the Central American Amerinds, which showed acoalescence age lower than that of the South Americans.Far from suggesting that Central American Amerindsoriginated more recently than South American Amer-inds, this result only reflects the reduced mtDNA diver-

    sity found in the Chibcha groups, from which all CentralAmerican mtDNA sequences came. The Chibchas re-duced mtDNA diversity is thought to have occurred be-cause of recent events (Kolman et al. 1995).

    Acknowledgments

    We thank to Mark Stoneking for helpful comments on anearlier version of this manuscript. This work was funded byFinanciadora de Estudos e Projetos, Conselho Nacional deDesenvolvimento Cientfico e Tecnologico, and Coordenacaode Aperfeicoamento de Pessoal de Nivel Superior.

    References

    Anderson S, Bankier AT, Barrrell BG, de Bruijn MHL, CoulsonAR, Drouin J, Eperon IC, et al (1981) Sequence and organ-ization of the human mitochondrial genome. Nature 290:457465

    Aris-Brosou D, Excoffier L (1996) The impact of populationexpansion and mutation rate heterogeneity on DNA se-quence polymorphism. Mol Biol Evol 13:494504

    Bailliet G, Rothhammer F, Carnese FR, Bravi CM, Bianchi NO(1994) Founder mitochondrial haplotypes in Amerindianpopulations. Am J Hum Genet 55:2733

    Batista O, Kolman CJ, Bermingham E (1995) MitochondrialDNA diversity in the Kuna Amerinds of Panama. Hum MolGenet 4:921929

    Bonatto SL, Redd AJ, Salzano FM, Stoneking M (1996) Lackof ancient Polynesian-Amerindian contact. Am J HumGenet59:253256

    Bonatto SL, Salzano FM (1997) A single and early origin for

    the peopling of the Americas supported by mitochondrialDNA sequence data. Proc Natl Acad Sci USA 94:18661871

    Cavalli-Sforza LL, Piazza A, Menozzi P (1994) History andgeography of human genes. Princeton University Press,Princeton

    DiRienzo A, Wilson AC (1991) Branching pattern in the ev-olutionary tree for human mitochondrial DNA. Proc NatlAcad Sci USA 88:15971601

    Donnelly P, Tavare S (1995) Coalescents and genealogicalstructure under neutrality. Annu Rev Genet 29:401421

    Easton RD, Merriwether DA, Crews DE, Ferrell RE (1996)mtDNA variation in the Yanomami: evidence for additionalNew World founding lineages. Am J Hum Genet 59:213225

    Eller E, Harpending H (1996) Simulations show that neitherpopulation expansion nor populations stationarity in a westAfrican population can be rejected. Mol Biol Evol 13:11551157

    Excoffier L, Smouse P, Quattro J (1992) Analysis of molecularvariance inferred from metric distances among DNA hap-lotypes: application to human mitochondrial DNAdata. Ge-netics 131:479491

    Forster P, Harding R, Torroni A, Bandelt H-J (1996) Originand evolution of Native American mtDNA variation: a re-appraisal. Am J Hum Genet 59:935945

    Ginther C, Corach D, Penacino GA, Rey JA, Carnese FR, HutzMH, Anderson A, et al (1993) Genetic variation among theMapuche Indians from the Patagonian region of Argentina:

    mitochondrial DNA sequence variation and allele frequen-cies of several nuclear genes. In: Penna SDJ, ChakrabortyR, Epplen JT, Jeffreys AJ (eds) DNA fingerprinting: state ofthe science. Birkhauser, Basel, pp 211219

    Harpending HC (1994) Signature of ancient populationgrowth in a low-resolution mitochondrial DNA mismatchdistribution. Hum Biol 66:591600

    Harpending HC, Sherry ST, Rogers A, Stoneking M (1993)The genetic structure of ancient human populations. CurrAnthropol 34:483496

    Hoffecker JF, Powers WR, Goebel T (1993) The colonizationof Beringia and the peopling of the New World. Science 259:4653

    Horai S, Hayasaka K, Kondo R, Tsugane K, Takahata N(1995) Recent African origin of modern humans revealedby complete sequences of hominoid mitochondrial DNAs.Proc Natl Acad Sci USA 92:532536

    Horai S, Kondo R, Nakagawa-Hattori Y, Hayashi S, SonodaS, Tajima K (1993) Peopling of the Americas, founded byfour major lineages of mitochondrial DNA. Mol Biol Evol10:2347

    Kimura M (1980) A simple method for estimating evolutionaryrate of base substitutions through comparative studies ofnucleotide sequences. J Mol Evol 16:111120

    Kocher TD, Wilson AC (1991) Sequence evolution of mito-chondrial DNA in humans and chimpanzees: control region

  • 7/29/2019 AJHG_1997_61!6!1413-1423 Four MtDNA Haplogroups in New World Bonatto & Salzano

    11/11

    Bonatto and Salzano: Native American mtDNA Diversity 1423

    and protein-coding region. In: Osawa S, Honjo T (eds) Ev-olution of life: fossils, molecules and culture. Springer, To-kyo, pp 391413

    Kolman CJ, Bermingham E, Cooke R, Ward RH, Arias TD,Guionneau-Sinclair F (1995) Reduced mtDNA diversity inthe Ngobe Amerinds of Panama. Genetics 140:275283

    Kolman CJ, Sambuughin N, Bermingham E (1996) Mito-

    chondrial DNA analysis of Mongolian populations and im-plications for the origin of New World founders. Genetics142:13211334

    Lemmen DS, Duk-Rodkin A, Bednarski JM (1994) Late glacialdrainage systems along the northwestern margin of the Lau-rentide ice sheet. Q Sci Rev 13:805828

    Merriwether DA, Ferrell RE (1996) The four founding lineagehypothesis for the New World: a critical reevaluation. MolPhylogenet Evol 5:241246

    Merriwether DA, Rothhammer F, Ferrell RE (1995) Distri-bution of the four founding lineage haplotypes in NativeAmericans suggests a single wave of migration for the NewWorld. Am J Phys Anthropol 98:411430

    Monsalve MV, Cardenas F, Guhl F, Delaney AD, Devine DV

    (1996) Phylogenetic analysis of mtDNA lineages in SouthAmerican mummies. Ann Hum Genet 60:293303

    Redd AJ, Takezaki N, Sherry ST, McGarvery ST, Sofro ASM,Stoneking M (1995) Evolutionary history of the COII/tRNALys

    intergenic 9 base pair deletion in human mitochondrialDNAs from the Pacific. Mol Biol Evol 12:604615

    Rogers A (1995) Genetic evidence for a Pleistocene populationexplosion. Evolution 49:608615

    Rogers A, Fraley AE, Bamshad MJ, Watkins WS, Jorde LB(1996) Mitochondrial mismatch analysis is insensitive to themutation process. Mol Biol Evol 13:895902

    Rogers A, Harpending H (1992) Population growth makeswaves in the distribution of pairwise genetic differences.Mol

    Biol Evol 9:552569Rogers A, Jorde L (1995) Genetic evidence on modern human

    origins. Hum Biol 67:136Rzhetsky A, Nei M (1992) A simple method for estimating

    and testing minimum-evolution trees. Mol Biol Evol 9:945967

    Salzano FM, Callegari-Jacques SM (1988) South American In-dians: a case study in evolution. Clarendon Press, Oxford

    Santos M, Ward RH, Barrantes R (1994) mtDNA variationin the Chibcha Amerindian Huetar from Costa Rica. HumBiol 66:963977

    Santos SEB, Ribeiro-dos-Santos AKCR, Meyer D, Zago MA(1996) Multiple founder haplotypes of mitochondrial DNAin Amerindians revealed by RFLP and sequences. Ann HumGenet 60:305319

    Shields GF, Schmiechen AM, Frazier BL, Redd A, Voevoda MI,

    Reed JK, Ward RH (1993) mtDNA sequences suggest a re-cent evolutionary divergence for Beringian and northernNorth American populations. Am J Hum Genet 53:549562

    Slatkin M, Hudson RR (1991) Pairwise comparisons of mi-tochondrial DNA sequence in stable and exponentiallygrowing populations. Genetics 129:555562

    Stoneking M, Sherry ST, Redd AJ, Vigilant L (1992) New

    approaches to dating suggest a recent age for the humanmtDNA ancestor. Philos Trans R Soc Lond B 337:167175

    Szathmary EJE (1993) Genetics of aboriginal North Ameri-cans. Evol Anthropol 1:202220

    Tajima F (1989) Statistical method for testing the neutral mu-tation hypothesis by DNA polymorphism. Genetics 123:585595

    Tamura K, Nei M (1993) Estimation of the number of nucle-otide substitutions in the control region of mitochondrialDNA in humans and chimpanzees. Mol Biol Evol 10:512526

    Torroni A, Neel JV, Barrantes R, Schurr TG, Wallace DC(1994) Mitochondrial DNA clock for the Amerinds andits implications for timing their entry into North America.

    Proc Natl Acad Sci USA 91:11581162Torroni A, Schurr TG, Cabell MF, Brown MD, Neel JV, Larsen

    M, Smith DG, et al (1993) Asian affinities and continentalradiation of the four founding Native American mtDNAs.Am J Hum Genet 53:563590

    Torroni A, Schurr TG, Yang C-C, Szathmary EJE, WilliamsRC, Schanfield MS, Troup GA, et al (1992) NativeAmericanmitochondrial DNA analysis indicates that the Amerind andthe Nadene populations were founded by two independentmigrations. Genetics 130:153162

    Wakeley J (1993) Substitution rate variation among sites inhypervariable region 1 of human mitochondrial DNA. J MolEvol 37:613623

    Wallace DC (1995) Mitochondrial DNA variation in human

    evolution, degenerative disease, and aging. Am J Hum Genet57:201223

    Ward RH, Frazier BL, Dew-Jager K, Paabo S (1991) Extensivemitochondrial diversity within a single Amerindian tribe.Proc Natl Acad Sci USA 88:87208724

    Ward RH, Redd A, Valencia D, Frazier B, Paabo S (1993)Genetic and linguistic differentiation in the Americas. ProcNatl Acad Sci USA 90:1066310667

    Ward RH, Salzano FM, Bonatto SL, Hutz MH, Coimbra CEAJr, Santos RV (1996) Mitochondrial DNA polymorphism inthree Brazilian Indian tribes. Am J Hum Biol 8:317323

    Yang Z (1996) Among-site rate variation and its impact onphylogenetic analyses. TREE 11:367372

    Yang Z, Kumar S (1996) Approximate methods for estimatingthe patterns of nucleotide substitution and the variation ofsubstitution rates among sites. Mol Biol Evol 13:650659