The Prior Probabilities of Phylogenetic Trees

19
The prior probabilities of phylogenetic trees Joel D. Velasco Received: 25 June 2007 / Accepted: 16 December 2007 / Published online: 17 January 2008 Ó Springer Science+Business Media B.V. 2008 Abstract Bayesian methods have become among the most popular methods in phylogenetics, but theoretical opposition to this methodology remains. After pro- viding an introduction to Bayesian theory in this context, I attempt to tackle the problem mentioned most often in the literature: the ‘‘problem of the priors’’—how to assign prior probabilities to tree hypotheses. I first argue that a recent objection— that an appropriate assignment of priors is impossible—is based on a misunder- standing of what ignorance and bias are. I then consider different methods of assigning prior probabilities to trees. I argue that priors need to be derived from an understanding of how distinct taxa have evolved and that the appropriate evolu- tionary model is captured by the Yule birth–death process. This process leads to a well-known statistical distribution over trees. Though further modifications may be necessary to model more complex aspects of the branching process, they must be modifications to parameters in an underlying Yule model. Ignoring these Yule priors commits a fallacy leading to mistaken inferences both about the trees themselves and about macroevolutionary processes more generally. Keywords Base rate fallacy Bayesianism Phylogenetic trees Phylogenetics Prior probabilities Systematics Tree shape Yule process Introduction Finding the solution to biological problems such as determining whether or not a Florida dentist passed HIV on to his patients (he did—Metzger et al. 2002), calculating whether or not brain size and testicle size are adaptively correlated in bats (they are anti-correlated—Pitnick et al. 2006), and determining how terrestrial J. D. Velasco (&) Department of Philosophy, University of Wisconsin, 5185 White Hall, 600 North Park Street, Madison, WI 53706, USA e-mail: [email protected] 123 Biol Philos (2008) 23:455–473 DOI 10.1007/s10539-007-9105-7

description

2008-

Transcript of The Prior Probabilities of Phylogenetic Trees

ThepriorprobabilitiesofphylogenetictreesJoelD.VelascoReceived:25June2007 / Accepted:16December2007 / Publishedonline:17January2008SpringerScience+BusinessMediaB.V.2008Abstract Bayesianmethods havebecomeamongthemost popular methods inphylogenetics, but theoretical oppositiontothismethodologyremains. After pro-vidinganintroductiontoBayesiantheoryinthiscontext, I attempt totackletheproblemmentionedmostoftenintheliterature:theproblemofthepriorshowto assign prior probabilities to tree hypotheses. I rst argue that a recent objectionthat anappropriateassignment of priors is impossibleis basedonamisunder-standing of what ignorance and bias are. I then consider different methods ofassigningpriorprobabilitiestotrees.Iarguethatpriorsneedtobederivedfromanunderstandingof howdistinct taxahaveevolvedandthat theappropriateevolu-tionarymodeliscapturedbytheYulebirthdeathprocess.Thisprocessleadstoawell-knownstatisticaldistributionovertrees.Thoughfurthermodicationsmaybenecessarytomodel morecomplexaspectsofthebranchingprocess, theymust bemodications to parameters in an underlying Yule model. Ignoring these Yule priorscommitsafallacyleadingtomistakeninferencesbothabout thetreesthemselvesandaboutmacroevolutionaryprocessesmoregenerally.Keywords Baseratefallacy Bayesianism Phylogenetictrees Phylogenetics Priorprobabilities Systematics Treeshape YuleprocessIntroductionFindingthesolutiontobiological problemssuchasdeterminingwhetherornot aFlorida dentist passed HIVon to his patients (he didMetzger etal. 2002),calculatingwhetherornot brainsizeandtesticlesizeareadaptivelycorrelatedinbats(theyareanti-correlatedPitnicketal.2006),anddetermininghowterrestrialJ.D. Velasco(&)DepartmentofPhilosophy,UniversityofWisconsin,5185WhiteHall, 600NorthParkStreet,Madison,WI53706,USAe-mail:[email protected] 3BiolPhilos(2008)23:455473DOI10.1007/s10539-007-9105-7mammals arrived in Madagascar (multiple separate rafting events rather than a landbridgePouxetal. 2005) all require knowledge of the evolutionary historyofcertain groups. Recovering this history is the project of phylogenetic inferencethegoalistobuildaphylogeny, orgenealogicalhistory,ofagroupofgenes,species,highertaxa, orwhatevertheobjectsofstudyhappentobe.This paper advocates the use of a particular methodology of phylogeneticinferenceBayesianinference. BeforeI offer anyjusticationfor this, I brieydescribetheproblemof phylogeneticinferenceanddescribeBayesianisminthiscontext. I thenprovideaminimal defensefor theBayesianapproach. For manysystematists, thereasontopreferothermethodscomesnotfromtheirbeliefinthecorrectnessoftheirpreferredmethodology, but ratherisaresponsetoasupposedproblemfor Bayesianismthe problemof the priors. By correcting seriousmisunderstandings about this problem and developing the beginning of a solution, IhopetobolstertheoveralldefenseofBayesianphylogenetics.In a typical problem of phylogenetic inference, we are concerned with recoveringfacts about the genealogies of particular biological groups at a variety of levels. Thedata used to construct these phylogenies can in theory be morphological, ecological,molecular, or anynumber of different types of information, but themajorityofpublished phylogenies today come from DNA sequences of individual organisms. Itisassumedthat thetrueunderlyinghistoryofthesesequencesisthat ofcommonancestry and descent with modication.This historyis thenrepresented as a binarybranchingtree. Thetips, or leaves of thetreearetheDNAsequencesandtheinternal nodes represent common ancestorsthe points in the past of coalescencewhenthedescendantsequencestracebacktothesametokensequencepresentinasingleindividual. Thoughaconclusionabout thephylogenyof species is nearlyalwaysdrawn, thephilosophicallymindedreaderissuretorecognizethatmovingfromatreeof DNAsequences toatreeof another kindsuchas aspecies treerequires a conceptual leap; this second stage of inference needs a separatediscussionofitsownandcansafelybeignoredhere.The phylogeny, the evolutionary tree, or just simply the tree may or maynot contain informationsuch as branching dates, rates of change along branches, orancestral character states, but it must give at least a branching diagram with the tipslabeled. This informationuniquelyspecies anyandall clades, or monophyleticgroups, on the tree. This branching diagramis called the tree topology and isgenerallytheprimaryobjectofinferenceforthesystematistbecauseknowledgeofthetopologyisaprerequisiteformost further inquiriesabout thehistory. Unlessspeciedotherwise,treehererefersjusttothetreetopology.BayesianphylogeneticsMaximumParsimonyandMaximumLikelihoodaretwofamiliesofmethodsthathavedominatedphylogeneticsdiscussionforthepast20yearsandbothhavetheiradvocates (Felsenstein 2004). Although there is a long and rich history of the studyof Bayesian statistics generally, it is only in the past ten years that Bayesian methodsof inference have been used in phylogenetic studies (Rannala and Yang 1996,Huelsenbeck etal. 2001). Bayesianismhas taken some time to catch on in456 J.D.Velasco1 3popularity and the details and their consequences certainly have not been as widelydiscussedasthoseattachingtoother methods(Randleetal. 2005). For example,Felsenstein in his attempt at a comprehensive textbook Inferring Phylogenies (2004)spends only one of 35 chapters on Bayesian methods. However, Bayesianmethodologyisgainingpopularitywithtimeandtodayitiswidelyusedalongsideothermethodsinpublishedresults.ThecentralideainBayesianphylogeneticsisthatallinferencesshouldbemadebyutilizingtheposteriorprobabilitydistributionofthetrees. Bayestheoremhasthefollowingconsequence:TheprobabilitythatatreeiscorrectgiventhesequencedatathatwehavePrTree j Data PrData j Tree PrTreePrDataPr(Tree), called the prior probability of the tree, is determined from a probabilitydistribution over all possible trees given before the data are examined. TheprobabilityofthedataPr(Data)isanormalizingconstantsimplyusedtomakesure that the posterior probabilities sumto 1. It is equal to the sumof theprobabilitiesofgettingthedataoneverypossibletreeweightedbytheparticulartreespriorprobability. LabelingeachtreetopologyasT1,T2,...Ti,wehave:PrData XTi PrData j Ti PrTiPr(Data|Tree) is called the likelihood of the tree, but it cannot be directly calculatedsincethetreetopologyalonedoes not giveussufcient informationtoassignaprobabilitytothedata. Rather, weneedadditionalinformationsuchasthebranchlengths(theexpectednumberofchangespersitealongaparticularbranch)alongwithsomemodelofevolutionthatwillcontainitsownparameterstobeestimatedsuchasthenucleotidesubstitutionrates.TheBayesianmethodfor dealingwiththesenuisanceparameters (parametersthat arent of primary interest) is to average over them by integrating them out. Inthe frequentist method called MaximumLikelihood, for each tree, nuisanceparameterssuchasbranchlengthsandsubstitutionmodelparametersaresetatthevaluethat wouldmaximizetheprobabilityofthedataonthat particulartree. TheMaximumLikelihoodtreeisbydenitionthetreewhichisaconjunctinthetree-plus-nuisance-parametersconjunctionwhichmakesthedatamost probable. Thus,confusingly, the likelihood of the tree used in Bayes Theorem is not the same as thetreesLikelihoodscoreusedforMaximumLikelihoodinferences.Treating nuisance parameters in the Bayesian way, if we denote a xed set of branchlengths as v and a xed set of parameter values of the model as h we now have:PrDatajTi ZvZhPrDatajTi; v; h Prv; hjTidvdhSubstitutioninboththenumeratoranddenominatoryieldsthisformula:PrTijData RvRh PrDatajTi; v; h Prv; hjTivdh PrTiPTiRvRh PrDatajTi; v; h Prv; hjTidvdh PrTiThepriorprobabilitiesofphylogenetictrees 4571 3The above formula tells us the posterior probability of any particular treehypothesis. If we are interested in something else, say the probability that aparticulargroupformsaclade, sincethisis, ineffect, alargedisjunction(thetruetree couldbe any one of the trees that contains that clade), the posterior probabilityof that cladeis simplythesumof theposterior probabilities of all trees whichcontainthat clade. Theprobabilitydistributionofanyotherparameterssuchasabranch length, the individual substitution rates, or the ratio of transitions totransversionsareallsimilarlycalculated.TheBayesianphilosophythusprovidesaframeworkfor answeringahost of relevant theoretical questionsall at thesametime.Of course actually calculatingthe full posterior distribution is another matter.However, there is reason to be hopeful here. Computational methods fornumerically estimating multi-dimensional integrals are quite advanced. ThestandardideaistouseMarkovChainMonteCarlo(MCMC)methodstoestimatethe posterior distribution. For an introduction to these methods in phylogenetics seeLargetandSimon(1999)andLarget(2005).Howwecanactuallyinfertheposteriorprobabilityandhowwecandoit inacomputationallyefcient mannerareimportant practicalquestions, butitistothetheoretical issues that I now turn. These questions assume that we have access to thevarious posterior probability distributions and ask why we should use these to makeourphylogeneticinferencesratherthanusesomeotherquantity. Iftherearedeeptheoretical problemswithBayesianmethodology, it hardlymattersifwehaveanefcientwayofcalculatingtherelevantprobabilities.One important, though hardly decisive, consequence of Bayesian methodology istheeaseofinterpretingresults. Sincetheposteriorprobabilityofatreejust istheprobabilitythatthetreeiscorrect(givenourdataandourmodelofevolution),thetree with the highest posterior probabilityis the treewhich is the bestsupported. Infact, thestrengthof itssupport ismeasureddirectlybytheposterior. Other factsabout the problem, such as which tree would require the fewest nucleotidesubstitutions, which is what the Parsimony score captures, are of interest only in sofarastheyareareliableguidetowhichtreeistrue(whichtheyoftenarent).Inaddition,unlikeothermethods,wecanjudgethestrengthoftheevidenceforallaspectsofthetreeatthesametimewithoutneedingtoreanalyzethedatausingdifferenttechniques.Theprobabilitythataparticulargroupformsaclade,thattwoparticular sequences have coalesced in the last one million years, that sequence A ismore closely related to B than to C, or any other question about the tree is measuredusingtheposterior distribution. Noneof theseproblemsareeasilyanalyzedwithother methods whichareusuallydesignedjust tondthebest topology. Whileparticular tests have been developed (see Felsenstein (2004) for a host of examples)nonehasastraightforwardstatisticalinterpretationthat isuseful andassuchtheygenerally appear to be disjoint, ad-hoc tests with no underlying, unied justication.While a theoretical justication can be constructed for using posteriorprobabilitiestoguideourinferences, Iwillnotattempttodosohere(forthatanda host of similar references, see Howson and Urbach (2005)). Rather, I will focus ona fewreasons why one might object to Bayesianismin this context. Somesystematists believe that probabilities and perhaps even all statistical methods458 J.D.Velasco1 3simply cannot be used to make inferences concerning a particular groupsevolutionaryhistorysinceit isauniqueevent meaningit hasoccurredonlyonce(SiddallandKluge1997, butseeHaber2005)orbelievethatParsimonyhassomespecialjusticationapartfromitsstatisticalbehavior(seeFarris1983;Kluge2005oranyofahostofpapersin-between, butseeSober1988). Fromthosewhoaremorestatisticallyminded,thereareworriesthattheposteriorsmightbeoverlysensitivetothechoiceofanevolutionarymodel orthat Bayesianinferencetreatsnuisanceparameters as randomvariables andthus is not properlyfrequentist asMaximumLikelihood appears to be (though see Yang 2006). While these areimportant objections, they have been dealt with elsewhere (besides the abovereferences, seeforexampleHuelsenbeckandRonquist 2005) andIwillnotdiscussthemfurther.Whiletherearecertainlymoreissuestodiscuss, thecentralproblemwhichhasyet to be adequately dealt with and is perhaps the most common objection toBayesianphylogeneticsandtoBayesianinferencemoregenerallyistheproblemofthepriorshowtoassignpriorprobabilitiestothehypothesesundertest. AsFelsenstein, an advocate of frequentist methods, puts it: If the prior is agreed by allto be a valid one, then there can be no controversy about using Bayesian inference(Felsenstein2004:300).Whiletherewouldofcoursestillbecontroversy,hispointisthattothestatisticallymindedtheoretician, thereshouldntbe.While a subjective Bayesian may respond that prior probabilities ought simply torepresentthepriorbeliefsoftheparticularinvestigator,itiscertainlyaworthwhileproject to attempt to model a certain kind of ignorance for the use of priors. This is adirect attempt to avoid biasing the results in favor of our prior conceptions. After all,we want results that ought to be taken seriously by a wide range of scientists and wemay want to know what just this data should lead us to believe. This is one of thegoals of so-calledobjectiveBayesianism (perhaps better calledInterpersonalBayesianismKadane 2006). As long as we have a proper understanding ofignorance, it would appear, at least in certain cases, that we should attempt to modelignoranceinthepriors. But therearemanythingsthat weappear tobeignorantaboutthe tree topology, its branch lengths, which groups form clades, etc. It mightseemthatmodelingignorancewithrespecttosomeofthesefactorsissimpleforexample, to model ignorance with respect to tree topologies we should assign equalpriorprobabilitiesforalltopologies. Thisdistributioniscalledauniformpriorontopologies. However, therearemanydifferent waysof conceivingof atree. Theshape of the tree refers to the branching diagram with no labels at the tips and so hasless information than the topology. The topology is simply an unlabeled shape withlabelsaddedtothetips. Inaddition, wemaybeinterestedinmorethanjust thetopology. Thelabeledhistory(sometimescalledrankedtopologye.g. SempleandSteel 2003)refersto the topologyplus a temporal ordering ofthe nodes. Thesedifferenceswillbecomeimportantlater;theyaredepictedinFig. 1.Togetatopologyfromashape,labelsareaddedtothetips.Inthisexample,ifB and C were switched, we would have the same shape but a differenttopology. Recall that topologies do not specify the time at which the nodes occur. Ina labeled history, the nodes are labeled to represent their relative temporal ordering.Inthisexample,Cand Dsplit fromeachother(node 3) beforeAandB split (nodeThepriorprobabilitiesofphylogenetictrees 4591 34). If 3 and 4 were reversed, we would have the same topology but a differentlabeledhistory.Someshapesareconsistent withmoretopologiesthanothers; ifeachtopologyhas an equal prior, not all shapes will be equally probable. Similarly, sometopologiesareconsistentwithmorelabeledhistoriesthanotherssoassigningequalpriorstoall topologiesmeansthat not all shapesnor all labeledhistorieswill beequallyprobable. Thisisapparentlyaproblemsinceit wouldappearthat weareignorantwith respect to each but yet we cannot model ignorance with respect to allthree. However, I suggest that this way of thinking about ignorance is a mistake. Weare not ignorant of everything regarding topology and shapeafter all, we know thelogical facts that connect them. The kind of ignorance we ought to be modeling doesnotalwaysleadtouniformpriors.AsanexampleofwhatImean, Inowturntoarecent example that purports to showthat the use of priors in phylogeneticsinevitablyleadstobiasresults.PriorsoncladesNearly every published paper using Bayesian methods uses a uniformpriordistribution on tree topologies which assigns equal prior probability to each possibletopology. Partlythisismotivatedbythesimplicityoftheproposalcombinedwithits being the only distribution available (other than entering your own constraints forparticularclades)inpopularcomputerprogramssuchasMr. Bayes(HuelsenbeckandRonquist 2001). And without careful examination, the proposal does seemsensibleafterall, whyshouldwehaveapriorpreferenceforonetopologyoveranother when the topology itself is the primary object that we are trying to infer? Infact, by not using priors at all, if used as a guide to truth, Parsimony and Likelihoodanalysis are carriedout inawaythat effectivelytreat all topologies as equallyShape(unlabeled branching diagram)Topology(add labels to tips)Labeled History(time order the nodes)E D C B AE D C B A4 321Fig.1 Threedifferentaspectsofatree460 J.D.Velasco1 3probableapriori.Thisfacthasnotbeentraditionallyseenasbiasingresultsinanyway. But asPickett andRandle(2005)(henceforthP&R)point out, auniformprior distributionontopologies implies a non-uniformdistributionon the priorprobabilities of cladesin particular, the probability that a particular group forms aclade depends on its size relative to the total number of taxa in the analysis. Smallerandlarger groups have higher probabilities while middle-sizedgroups have thelowestprobabilities.Figure2providesanexample,whensetsofdifferentsizesaredrawnfrom50taxaplacedatthetips(orleaves)ofthetree.Whiletheparticularvalueswouldchangewithadifferentnumberoftaxainthestudy, the shape of the curve will not. Any arbitrary group of taxa is a possible cladeand P&R contend that all such groups regardless of their size should have the sameprior probability of forming an actual clade in a given problem with a xed group oftaxa.Analyzingsimulateddataaswellasdatafromseventeenpublishedempiricalstudies,P&Rarguethattheuseoftheuniformdistributionhasbiasedtheposteriorprobabilitiesinpredictableways,namely,thattheverysmallestandlargestcladestypicallyhavethehighestposteriorsprobabilitiesandthemiddle-sizedcladeshavethelowest. Thisresult correspondstothepriordistributiononcladesimposedbysettingauniformpriorovertopologies.Severalsubsequentpapersandbookshavecitedthisfact(e.g.GoloboffandPol2005andYang2006)anddifferentexampleshavebeenproducedwhichleadtothesameresults. Theauthorsagreethat thesefactsleadtodevastatingconclusionsfortheBayesian.Thislineofthinkingisbasedonmisunderstandingwhatitisfortheposteriortobebiasedandwhat theappropriateunderstandingof ignorance is. It is entirelyproper for different sizedclades tobe more or less probable a priori since theappropriateunderstandingofaprioriinthiscontextbuildsinrelevantbackgroundknowledge. P&Rs claim that you cant have both uniform priors on topologies andon clades is correct; in fact, Velasco (2007) strengthens their proof by showing thaton any probability distribution on trees (not just the uniform one) not all clades canFig.2 A graph depicting the probability that a group of a given size forms a clade on a tree with 50taxawhenauniformpriorontopologiesisusedThepriorprobabilitiesofphylogenetictrees 4611 3beequallyprobable. Thereis nothingspecialaboutthe uniformprior on topologieswhichconictswithuniformpriorsoncladesuniformpriorsoncladesissimplyinconsistent. Having every possible clade be equally probable is not something thatwe could have even if it were desirable (which it isnt.) Once we see why this is so,itbecomeseasiertoseewhatconclusionsweshoulddrawfromit.Thereisaneasyexplanationforwhyitisimpossibleforeverypossiblecladetohaveanequal probabilityof forminganactual cladeonthetruetree. Inagivenproblemwithaxedset of taxa, theprobabilitythat agroupof aparticular sizeforms acladeis just theexpectednumber of clades of that sizedividedbythenumberofpossiblecladesofthatsize.Letsusetwospecicsizes(cladesofsize2and 3) as examples to show that they cant be equally probable. The fact that not alloftheprobabilitiescanbeequalcanbededucedfromthefollowingtwofacts:(1) Sincesmallercladesarenestedinsidelargerones,onanytree(andthereforeon the true tree), there are at least as many actual clades of size two as there areofsizethree. Therefore, onanyprobabilitydistributionovertrees:the expected number of clades of size 2Cthe expected number of clades of size 3.(2) Whenthereareatleastveleaftaxa:thepossiblenumberofcladesofsize2\thepossiblenumberofcladesofsize3.Therefore, (whenwehaveatleastvetaxa),theexpectednumberofcladesofsize 2thepossible numberofcladesofsize2 6theexpectednumberofcladesofsize 3thepossiblenumberofcladesofsize3Sonot all possibleclades of sizetwoor threecouldbeequallyprobableandafortiorinotallpossiblecladescanbeequallyprobable.To determine the actual numerical probabilities, we need to know two things: thenumbers of possible and actual clades of each size. The number of possible clades ofsizexisjustthenumberofpossiblewaysofchoosingagroupofsizexfromthecollection of n taxa which is just n choose x n!x!nx! : The number of actual cladesofagivensizewill dependonthetree. Auniformdistributionontreetopologiesyieldsaparticulardistributionontheexpectednumberofcladesofanyparticularsize rst calculatedinBrown(1994). The above facts are perhaps more easilyappreciatedbyattendingtothefollowinggraphsinFig. 3:Therstgraphplotshowthesizeofacladedeterminesthenumberofpossibleclades of that size. I have used n=50taxa as an example, but the shape of the curve isthe same for any number of taxa. Notice that the scale is logarithmic meaning that thereare vastly more possible clades of size 25 than, say, size 10. The second graph plotshow the size of a clade determines the expected number of clades of that size on theuniformdistribution on topologies. Since the probability of a clade is just the expectednumberofcladesofthatsizedividedbythenumberofpossiblecladesofthatsize(assuming all clades of the same size have the same probability), if the probability of aclade is to be the same for every size, these two curves must have the exact same shape(oneshouldbetheothermultipliedbyaconstanttheprobability).Noticethattheexpectedclades curveiscalculatedunder auniformprior ontopologies(asin462 J.D.Velasco1 3Picket and Randle 2005)for other topology distributions the curve varies in shapeslightly, but a fewaspects remain constant, such as the fact that its peak must be at size2. Since no distribution on trees gives it the same shape as the possible clades curve,the probabilities of all possible clades can never be identical. Aformal proof of this factis given in Velasco (2007).Sowhatshouldwemakeofthistheorem?ItmightbethoughtthatwehavejustshownthatBayesianismisaawedmethodology.Afterall,haventwejustshownthat it is impossible tomodel ignorance withrespect toclades since clades ofFig.3 Twographscomparingthenumberofpossiblecladesofagivensizetotheexpectednumberofcladesofthatsize.Theexpectednumberofclades(thevalueonthebottom)dividedbythenumberofpossible clades (the value on the top) is the probability that that group forms an actual clade. This gure istakenfromVelasco(2007)Thepriorprobabilitiesofphylogenetictrees 4631 3differentsizesmusthavedifferentprobabilities?Andisntthisobviouslybad?AsP&Rputit,Few, if any, systematists believe a priori that the probability of monophyly hasanythingtodowiththenumberoftaxahypothesizedtobemonophyletic.Certainly, the prior assertion that small clades and large clades are moreprobable than mid-sized clades lacks biological relevance. As such, a return tooptimalityperseiswarranted. (PickettandRandle2005:p.209)P&Raswell asGoloboffandPol (2005)andYang(2006)claimthat uniformpriors ontopologies introduce a bias infavor of smaller andlarger clades andagainstmediumsizedones.Wehavejustseenthatthisdisparityinprobabilitiesisguaranteedtooccurregardlessofourchoiceofpriorsontrees.Theirchoiceoftheword bias indicates that they think that this is a bad thing. P&R think this justiesabandoning Bayesianismas it is currently practiced and they suggest threealternative, incompatible methods for being Bayesian while attempting to articiallycorrect forthisbias. However, their conclusionthat usingpriorsintroducesanunacceptablebias intothe problemrests on a mistake. We want the probabilities ofcladestodependontheirsize. Articiallychangingtheposteriorsoralteringhowwe measure the strength of the evidence to correct for this would actually introducebias. Thereisbiological relevancetothefact that cladesofsizetwoshouldhavehigher priors thanthose of size threewe knowfromthe waythat clades areproducedthatcladeswithlargernumbersoftaxahavesmallercladeswithinthem.Basicmathematical factscombinedwithbackgroundbiological factsindicatethatwe should believethatgroups withmoretaxaarelesslikelytobeclades.Claimingignorance with regard to whether the true tree contains a particular clade of size twoor whether that tree contains a particular clade of size three is like claimingignorance with respect to whether some random integer is divisible by 2 or divisibleby4.Ignorancedoesnotentailequallyprobable.Toheadoff apossibleresponse, noticethat theideaof clades nestedwithinclades explains why smaller clades should be more probable, but this doesntexplainwhylargercladesalsohavehigherpriors. But thisisnot aproblem. Thehighprobabilityof verylarge clades is simplyanartifact of the designof theproblem. If our problem uses 10taxa, for nine of them to form a clade, all it takes isfor thetenthtobeoutsideof therest. However, that samegroupof ninetaxaismuch less likely to forma clade if the problemconsidered 50taxa. Unlike aproblemwith10taxa,with50taxa,cladesofsize8aremoreprobablethancladesof size 9. The bias toward very large clades essentially comes from assuming that alltaxa under consideration form a clade. Just as the conditional probability that A andBforma clade is relatively high given that A, B, and Cdo, the conditionalprobabilityofninetaxaformingacladeishighgiventhatweareactingasiftheyareinsideacladeof10taxa.Thisfactismoreeasilyappreciatedbyrecallingthatformingacladeisonlymeaningfulinthecontextofaparticularproblem.Foragrouptoformacladeinaparticularproblem, themembersofthegroupmustbemore closely related to each other than to any other taxa under consideration.HumansandGorillasformacladeaslongasChimpanzeesarenotoneofthetaxaunderstudy.Thereisnothingobjectionableaboutthiseither.464 J.D.Velasco1 3This argument shows that the probability of a clade must depend on its size, but ifwe do not carefully formulate the question, there might appear to be obviouscounterexamples. Ifwethinkofparticulargroups, it istemptingtoconcludethatP&R might be correct after allfor example, what should the prior probabilities ofmonophyly be for the following groups: apes, mammals, and vertebrates?Accordingto the above reasoning, the prior on apes shouldbe low, mammalsextremelylow, andvertebratesunbelievablytiny.Butouractualcondenceinthethreegroupsdoesnt appeartodependonsize. SoareP&Rcorrect afterall?No.There are several problems with the supposed analogy, but the major statistical erroristhat thisisaninstanceofsamplingbias. Ignorethefact that manysystematistswouldsimplydene these groups insucha wayas toguarantee that theyaremonophyleticandimaginethat weareworkingwithamoretraditional denitionbasedoncharactersorthinkofvertebratesasrigidlydesignatingsomeset oftaxawhichwecurrentlybelievearevertebrates. Thesampleisbiasedbecausewehaveselectedcladesthat haveahighposteriorprobabilityofbeingmonophyleticandthenweareaskedtoimaginewhat theirpriorsshouldbe. Forexample, theyeach have what appear to be uniquely derived characters. Of course clades ofdifferent sizescanhavethesameposteriorprobabilities. Butthisisnottheclaim.P&Rareclaimingthatbeforeweexaminearbitrarygroupsoftaxathatweknownnothing about, we should be equally condent that they are monophyletic regardlessof their size. But thisisabsurd. ImagineI assigneachof 100primatespecies adifferent numberandthenrandomlyselect someof thosenumbers. What arethechances that the numbers I have selectedpickout a monophyletic group? Thechanceswill obviouslyvarywiththenumberoftaxathat Iselect. IfIselect twoprimatespeciesatrandom,theoddsthatIhaveselectedamonophyleticgrouparelow, but theyarevastlyhigherthantheoddsthat Ihaveselectedamonophyleticgroupif I hadselectedftyrandomspecies. Yet thisisexactlyanalogoustothequestionunderconsideration. Sizedoesmatter.PossiblepriorsandtheprincipleofindifferenceTheaboveargument shows that wehavetobecareful whenwewishtomodelignorance, but it does not tell us how we actually ought to do so. We need to furtherconstraints toguide our priors. The above arguments onlyshowthat clades ofdifferent sizesshouldhavedifferent probabilitiesbut it isclearlycorrect that allpossiblecladesofsize2shouldbeequallyprobable, all possiblecladesofsize3shouldbeequallyprobable, etc. Inotherwords, ifweasksomequestionabout agroupofntaxathatareotherwiseunknowntous,itshouldntmatterwhichntaxawe select. If we want to know the probability that A is closer to B than to C or that Aand B coalesce in the past million years, it shouldnt matter which taxa A, B, and Crepresent. Distributionsthat satisfythisconditionarecalledlabel-invariant. Ifwewant to model ignorance with respect to the particular taxa we choose, we must usea label-invariant prior. While this is certainlyhelpful, it still leaves us withaninnite number of choices. For example, a uniform prior on topologies satises thiscondition,butsodomanydistributionsthatentailthatsomeshapehasprobabilityThepriorprobabilitiesofphylogenetictrees 4651 3one. Whilethesesecondtypesofdistributionsarecertainlyimplausible, wecantrulethemout simplyonthebasisoftheconditionthat wemust treat eachtaxonequally.Althoughuniformpriorsontopologiesaretypicallyused,wehavealreadyseenthatseveralauthorsbelievethatitleadstobiasedresultsthatcanbeuncoveredbyexaminingotherfactorssuchasparticularclades. Whileunequal priorsoncladesarenot agoodreasontogiveupuniformpriors ontopologies, perhaps lookingelsewherewill providejust suchareason. For example, withfour taxathereare15 different topologies12 have the pectinate A(B(C,D)) shape (this notationmeansthatCandDformacladewhichisnestedinsideB,C,andDwhichformaclade)whileonlythreehavethebalanced(A,B), (C,D)shapewheretherearetwoclades of size2. Souniformpriors ontrees introduces askeweddistributiononshapes. Isthisacceptable?Atraditional defensefor uniformpriorsontopologiesmight appeal totheprincipleof indifferencewhenthereisnoepistemicreasontopreferonetopologyoveranother, theyshouldall haveequal priors. Ofcoursemost versions of the principle of indifference have well-known problems andtypicallyleadtoinconsistency(Joyce2005), but theremaybesomelessgeneralprinciplewhichapplies inthis casethat isnt problematic. But evenaprincipletailored specically for phylogenetics is going to be question-begging in thiscontext as the obvious response is that there is a reason to weight topologiesdifferentlynamely,someshapesareconsistentwithmoretopologiesthanothers.If webelievedthat shapes shouldbeequallyprobable, this (together withlabelinvariance) would determine a particular distribution on topologies that favorstopologiesthataremorebalanced.Inaddition,wemightalsowishtoassignequalprobabilities to each labeled history. Each distribution is different so whichdistributionistobepreferred?In other cases in science where we think that there is a good answer to this type ofquestion,thecorrectpriorisalwaysdeterminedbylookingatthephysicalprocessthatgeneratesthevaluesfortheprobabilities.Inmanycases,theprocesscanvary.Selecting a day at random might yield a prior probability of 1/365 for anyparticular day being selected, but if the process of selection involves rst selecting amonth at random and then selecting a day within that month, the probabilities wouldbedifferent. Regardlessoftheprocess,thepointisthatifweknowthemethodofselection, then we can determine howto model ignorance. Assigning priors isproblematic only in cases where we do not have an understanding of the underlyingprocess.In the phylogenetic case, the tree is a result of the biological process of commonancestry and descent with modication. We want to know the probabilitydistributionthat results whena tree is producedbythis process. Trees are theresultofthesequencespassingdownfromorganismtoorganismviareproductiononthebranchesandsplittingatthenodeswhentheorganismgivesrisetomultipleoffspring which lead to different, extant taxa. A perfectly random branching processiscapturedbytheYulebirthprocessinwhichparticlesreproducewithaconstantprobability of giving birth per particle per unit time so the Yule birth process seemstheidealplacetostartourinvestigation.466 J.D.Velasco1 3TheYuleprocessIn1924,G.U.Yuledevelopedastatisticalmodeltohelpexplainwhysomegenerahave many more species than others (Yule 1925). The model was based on thinkingof speciation as lineage splittingone lineagegives birth to another without dying.Inthesimplestcase,theideaisthatwestartwithacommonancestorandthentheprobabilityof anyparticular lineage splittinginsome small unit of time is theconstant kdt.Twosplittingeventshappeninthesametimeperiodwithprobabilityo(dt). As time passes, there are more and more lineages present, each with the sameprobabilityofsplittinguntil wereachthenal result ofntaxa. Ifat eachsliceoftime, eachexistinglineagehasanequal chanceofsplitting, wecall theprocessaYulepurebirthprocess.Another way to think about this process is by looking at the present and workingbackwards. The coalescent process imagines n gene sequences existing at thepresent.Thenaswemovebackintimetheywillbegintocoalesce.Eachsequencehasanequalprobabilityofcoalescingwithanyotherparticularsequenceandthenwe gofromnton-1sequences andrepeat the process again. This process isobviouslyjusttheinverseofthebirthprocessandsothesamemathematicalrulesapply yieldingthe same probabilities for certainparameters such as shape andtopology(Kingman1982).Forourpurposes,wewanttoknowtheprobabilityofgettinga particulartreeasthe result of a Yule process. The answer is that a Yule process produces each labeledhistory with equal probability (Edwards 1970). Thus the distribution that eachlabeled historyshouldbeequally probable a priori canbegivena justication.Thejusticationisnottheoneprovidedbytheprincipleofindifference,whichsaysIcant thinkof a reasonwhyone labeledhistoryshouldbe more probable thananother. Rather, thejusticationis that if theevolutionof different taxais theresultofrandomlineagesplitting,thenfor nrandomtaxa,theprobabilitythattheyform a particular tree topology is proportional to the number of labeled histories thatareconsistentwiththattopology.One might be worriedthat we are ignoringextinction. We couldeasilyaddanother parameter l where the probability of any particular lineagegoing extinctisldt. This is known as a birthdeath process. Importantly, it leads to exactly the samedistributionontreetopologies. Aslongastheextinctionhappensrandomlyacrosslineages,thepriorprobabilitieswillbethesame(Thompson1975).Thepurebirthprocess, thebirthdeathprocess, andthecoalescentprocessallleadtoexactlythesamedistributionalllabeledhistoriesareequallyprobable.The idea that the Yule process represents a randomly branching tree is not newinthemathematical literature(Harding1971; Aldous2001). Thisideaisalsofairlystandard in the biological literature. The Yule birth process (or more typically a birthdeathprocess)iswidelyusedtostudymacroevolutionarytrends.Forexample,thediscovery of broad-scale biogeographical patterns and the detection of differences inspeciationorextinctionratesacrosslineagesarestandardlythought todependoncomparingtheacceptedphylogenytoanullmodelofrandombranching.Thenullmodel typically used for such comparisons is the Yule model (e.g. Mooers and Heard1997 and many of the very large number of references therein). The Yule process isThepriorprobabilitiesofphylogenetictrees 4671 3alsowidelyusedtostudymicroevolutionaryprocesses. Thestandardmethodofstudying intraspecies diversity will use a coalescent process to build gene genealogieswhichareessential totestinghypothesessuchasthoseconcerningthestrengthofselectionat aparticular siteor testingtheamount of geneowbetweendistinctpopulations (Halliburton 2004, Hein et al. 2005). Despite the near-universalacceptanceoftheYuleprocessbeingtheunderlyingphysicalprocessforcommondescent and therefore the production of phylogenetic trees, biologists virtually nevertake this process into account when actually constructing trees! (For exceptions, seeRannala and Yang 1996; Yang and Rannala 1997). The use of prior probabilities inBayesian phylogenetics makes thinking about the probabilities of trees unavoidable,but theideaof anull model for atreeis requiredeveninmethods that donotspecicallyattempt touseaprior probabilitydistribution. Asweshall seelater,ignoringthesefactscanleadtomistakenconclusionsnotonlyinconstructingtreeswhich are best supported by the evidence, but also when we attempt to use those treestomakefurtherinferencesabouttheevolutionaryprocess.Theoretically,itiswellmotivatedtostartinsistingonsuchachangeinmethodology,butInowturntothequestion of what, if any, consequences making such a change will actually have.We have already noted that the Yule distributionthe probability distributionof trees inducedbyaYuleprocessis adifferent distributionthantheuniformdistribution.Withfourtaxa,thereare15topologiesand18labeledhistories.Sincesometopologies(thosewiththepectinate, asymmetricshape) areconsistent withonly one labeled history and some are consistent with two (the balanced shape), thepriorsshiftfrom1/15ontheuniformtopologytoeither1/18or2/18dependingonwhether wearelookingat theasymmetricor thebalancedtree. Ingeneral, moreasymmetric topologies will have their prior probabilities lowered and more balancedtrees will havetheirsraised.There are manywaysthat the overall balanceofa treecouldbemeasured(MooersandHeard1997), but certainlyintheclearcases, theresultofaYuleprocessisthatatreethatismorebalancedwillbeconsistentwithmore labeled histories (there are more pathways to reach it) and thus is moreprobablethananyparticularunbalancedtree.The idea that balanced trees are consistent with more labeled histories andtherefore are more probable than unbalanced trees is exactly analogous to the claimthatifweipafaircoin100times,wearemorelikelytoget50headsthansomeother number of heads. If the coin is fair, each particular sequence of heads and tailsis equally probable. Since 0 heads is only consistent with one sequence, it is far lessprobablethan50headswhichisconsistent with &1029sequences. Animportantside note is that we should not conclude that the Yule process will probably result ina balanced tree. The appropriate conclusion to draw is that Pr(T1|T1isbalanced) [Pr(T2|T2isunbalanced)not that Pr(Treewill bebalanced) [Pr(Treewill beunbalanced). Far fewer treetopologies arebalancedthanunbalanced, soeven though each has a higher probability than those that are unbalanced, theunconditionalprobabilitythatatreeisbalancedisstillrelativelylow.So we knowthat if we replace uniformpriors with Yule priors, the priorprobabilities of unbalanced trees will go down while those of balanced trees will goup.Butdoesthisdifferencereallymattertotheirposteriorprobabilities?Thiswilldependontheparticular problem. Problemscanbeconstructedwherethepriors468 J.D.Velasco1 3matter. Problems can be constructed where they dont. With enough data, thelikelihoods of thevarious trees will completelyswampdifferences inthepriorsbetween trees.Buthowmuch dataisrequired andjusthowmuch thisdifferenceinpriorsmattersinrealisticcasesissomethingthat will requirecareful quantitativeinvestigation.It is widely known that the number of possible trees with n taxa=2n 3!! Pn22n 3 (Felsenstein 2004). Steel and McKenzie (2001) provide arecursive algorithm for calculating the number of labeled histories consistent with aparticular topology. For each vertex v (node) let d(m) be the number vertices that areits descendants (including itself). Note that this is the same as the number of taxa inthesubtreeformedbythat nodeminus 1. Now, thenumber of labeledhistoriesconsistent withanyparticulartreetopology= n1!Pvdv. Forexample, thenumberoflabeledhistoriesconsistentwiththeperfectlybalancedfourtaxatree=41!311 2.Combinedwiththeformulaforthetotalnumberofpossiblelabeledhistoriesforntaxa:n!n1!2n1(Edwards 1970) we nowcancalculate the prior probabilityof anyparticular treeunder theYulemodel. Toseedirectlywhether thiswill affect theposterior probability of any individual tree, we would need to calculate thenormalizingconstantPr(Data)whichwecant do. AnotheroptionistorunanMCMC on some particular data set with uniform priors as is typically done and thenrun the MCMCon the same data set with Yule priors insteadof uniformpriors andsimplycheckfordifferencesintheresults. Thismethodrequiresustorecalculatethe entireposterior distribution justto see if therewillbe anysignicant differenceintheposteriorsofparticulartrees. Butthereisanothermethodthatcantellusatleastsomeofwhatwewanttoknow.Imaginethatweperformthecalculationswithuniformpriorsandgettheresultthat T1 has a higher posterior probability than T2. How probable is it that the resultswould be differentif we used Yule priors instead? For the order to switch, the ratioof the posteriors would have to switch from being greater than 1 to being less than 1.By Bayes Theorem, the ratio of the posterior probabilities is equal to the ratio of thepriorstimestheratioofthelikelihoods:Bayes TheoremOdds Ratio formPrT1jDPrT2jD PrDjT1PrDjT2PrT1PrT2Since the likelihoods themselves will not change, we candirectlycalculate theeffect of changing the priors. Since the old prior ratio was 1:1, if we want to switchthe ordering on trees, we need the new prior ratio to be greater than the reciprocal ofthe likelihood ratio. So how large is the ratio of the priors? In the fourtaxa case, themost balancedtoleast balancedratiois only2:1. But likeall other effects thatdepend on the number of possible trees, this is going to increase combinatorially. Togive an extreme example, the perfectly balanced tree with 64taxa (it splits into twosubtreesof32,eachofthosesplitsintotwosubtreesof16,etc.)isconsistentwith631!63312154783162:61 1063labeledhistories. Sincethemaximallyunbalancedtree,whichhassplitsof1:63then1:62,then1:61,etc.,isconsistentwithonlyonelabeledhistory, 1063is alsotheratioof theprior probabilities of thetrees. Forn=128,thisratiorisesto &4.1910163.WhilethelikelihoodratiocaneasilybeThepriorprobabilitiesofphylogenetictrees 4691 3greater thanthis for several thousandindependent sites, these massive numbersshouldcertainlygivepausetoanyonewhoclaimsthatusingdifferentpriorswouldnot makeanydifference. Certainlytheywill makesomedifferencetotheoverallposteriordistribution.Exactlyhowmuchdifferencetheywillmakewilldependonthe particular case and general conclusions will require further study. Regardless ofhowoftenchangingthepriorsdramaticallyaffectstheposteriors, wehaveatleastthe beginnings of an explanation of why we should use one prior probabilitydistributionratherthananother.Animportantfeatureofthisdiscussionisthatitisnotessentialtothisargumentthat theYuleprocessperfectlycapturesthecausal processbywhichevolutionarytreesareproduced. Inparticular, itisclearlyunrealisticthatatagiventime, eachextantlineagehasthesameprobabilityofsplitting.Thispapertakestheimportantrst step of using priors that at least attempt to be biologically relevant. There is noknownbiological processthat wouldleadtoauniformdistributionontopologies.As such, there can be no justication for using these priors. In addition, we can thinkof theYulemodel withnoother effectsasthesimplest amongawholeclassofbranching models which might be used to generate priors. Further biologicalinvestigation can help us improve our branching models and thus improve theaccuracyofourpriorprobabilities. TheYulemodel, not theuniformmodel, willformtheessentialbackboneofanysuchfutureinvestigations.In addition, it needs to be pointed out that Yule process models how all of the tipsresultingfromacommonancestorareexpectedtoberelated.Thismeansthattaxasampling will severely affect the model. If we are examining all of the tips that havedescended from some ancestor, then the Yule process will be adequate. Similarly, ifwerandomlysampletips, thiswillnot affect thedistribution. However, ifweusesome non-randommethodsuchas samplingtwoorganisms fromeachspeciesunder investigationit is easytosee that we shouldexpect the tree tohave adifferent shape. Aclearinstanceofthisistheuseofoutgroupswhichguaranteesthat the tree will be very imbalanced at the rootsomething that is improbable on aYule model. Depending on how we sample, we might be able to correct this bias (inthetwoabovecases, thisiseasy).Butoftentimes,wesampletaxanon-randomly,but not byanswer process whichwe canbuildintoour model. But this entirediscussionsimplyreinforcesthepointthatitisessentialtorealizethatnotalltreesareequallyprobableapriori andthat this fact canaffect our results whenit isignored.Thebase-ratefallacySince using different priors on topologies could lead to different results in particularphylogenetic studies, it could also lead to different results in studies that usephylogenies to make further biological inferences. This is particularly relevant sinceIhavearguedthat phylogeniesproducedwithout attendingtotheYulemodel arenot just in error, but that they are in error in a particular way. I will now examine anexample of this error. The base-rate fallacy is a common mistake made in everydayreasoning. That mistakeistoignorethebase-rate, or prior probability, of events470 J.D.Velasco1 3when making inferences. Here is a standard example of medical testing often foundintheliterature.We takea random personin the UnitedStatesandadminister an HIV test whichis accurate in 95% of all cases. The test shows up positive. The proper conclusion todrawisthatthispersonprobablydoesnothaveHIV.WecanreachthisconclusionbynotingthatthepriorprobabilitythattheyhaveHIVcanbe approximatedbythebase-rate of HIV in the population. In 2005, the CDC estimated that there are about438,000 people living with HIVout of over 300million in the US and itsdependenciesgivingusapriorprobabilityof.00146(CentersforDiseaseControlandPrevention2005). ByBayesTheorem,PrHIVj test PrtestjHIV PrHIVPrtest

0:95 0:001460:95 0:00146 0:05 0:998540:027Inotherwords, thereisonly2.7%chancethat thispersonactuallyhasHIV. Theexplanation is simple5% of the people who dont have HIV will get a positive testresult and this group is much larger than the group of individuals who actually haveHIV. Certainly, thepositivetest result raisestheprobabilitythat thispersonhasHIV. In fact, it raises it by a factor of almost 20but this only raises the probabilityfrom0.15%toabout 2.7%. Ingeneral, ifthefalse-positiverateishigherthanthebase-rate, thentherewill bealess than50%chancethat theyactuallyhavethediseaseinquestion. If welookat onlythelikelihoodof havingHIV(0.95) andignorethebase-rate, wearecommittingthebase-ratefallacy. Whileignoringveryskewedbase-ratesisparticularlybad, itisimportanttorealizethatitisalwaysanerrortoignorebase-ratesregardlessofwhattheyare.While the debate over how to assign prior probabilities might be seen as a debateinternal to Bayesianism, understanding the underlying process that generatesphylogeniesisessentialtomakingcorrectinferencesregardlessofmethodology.Ifthe Yule process truly underlies the production of phylogenetic trees, then to ignoreit asParsimonyandMaximumLikelihoodmethodsdoisakintocommittingthebase-rate fallacy. Similarly, using a prior distribution, but using the wrong one suchaswhentheuniformdistributionisused,leadstothewrongconclusions.Ifwearelucky enough to have data which show a very strong signal for particular clades, thedata will overcome the bias that these mistakes introduce, but this will certainly notbethecaseineveryinstance.Asapractical exampleofthiserror, thereisalargeliteratureonhowtomakeinferencesbasedontheshapesoftreesandtheconsensusintheeldisthattrees(basedonpublishedphylogenies) seemtobe more asymmetric thanwe wouldexpect bychance(HuelsenbeckandKirkpatrick1996; MooersandHeard1997).This has lead systematists to conclude, among other things, that effects such as cladeselection are prevalent and that phylogenies are not just the result of randombranching. What wewouldexpect bychance is(appropriately) determinedbyexaminingaYuledistribution, but thepublishedphylogeniestypicallydonot usepriorprobabilitiesandiftheydo,theyuseauniformdistributionwhichisskewedThepriorprobabilitiesofphylogenetictrees 4711 3toward asymmetry relative to the Yule distribution. Since the Yule process representsrandom branching, the use of uniform priors on topologies (or the use of no priors atall)havebiasedtheresultsinfavor ofmoreasymmetrictrees. Inotherwords, weshould expect the result that published phylogenies are more asymmetric thanexpected by the Yule process. By thinking about the base-rate fallacy, we can see thatif our data leads us to conclude that a tree is unbalanced, this might be a case where itismoreprobablethatthetreeismorebalanced, butthatthedataismisleading. Ofcourse not every case will be a false positive, effects such as clade selection and taxasampling bias certainly do affect inferred tree shapes, but the above analysis points toan important project that still needs to be done reexamining the data on tree shapesto see just how much of the apparent difference between actual history and randomlyproduced trees is simply an artifact of getting the history wrong in the rst place duetoignoringtheprocessbywhichtreesaregenerated.Acknowledgements I amgrateful to David Baum, Matt Haber, James Justus, Bret Larget, GregNovack, and especially Elliott Sober for their support and many helpful comments on earlier drafts of thispaper.Thanksalsotoananonymousreviewerwhomadeseveralhelpfulcommentswhichimprovedthepresentationofthispaper.ReferencesAldous DJ (2001) Stochastic models and descriptive statistics for phylogenetic trees, from yule to today.StatSci16(1):2334BrownJKM(1994)Probabilitiesofevolutionarytrees.SystBiol43(1):7891CentersforDiseaseControlandPrevention(2006)HIV/AIDSSurveillanceReport, 2005, vol17. U.S.Department of Health and Human Services, Centers for Disease Control and Prevention, Atlanta, pp154.Alsoavailableathttp://www.cdc.gov/hiv/topics/surveillance/resources/reports/2005report/EdwardsAWF(1970)Estimationofthebranchpointsofabranchingdiffusionprocess.JRStatSocB(Methodological)32(2):155174FarrisJS(1983)Thelogicalbasisofphylogeneticanalysis.AdvCladistics2:736FelsensteinJ(2004)Inferringphylogenies.SinauerAssociates,Sunderland,MassGoloboff PA, Pol D(2005) ParsimonyandBayesianphylogenetics. In: Victor AA(ed) Parsimony,phylogenyandgenomics.OxfordUniversityPress,Oxford,pp148159HaberMH(2005)Onprobabilityandsystematics: possibility, probability, andphylogeneticinference.SystBiol54(5):831841HalliburtonR(2004)Introductiontopopulationgenetics.Pearson/PrenticeHall,UpperSaddleRiverHardingEF(1971)Theprobabilitiesofrootedtree-shapesgeneratedbyrandombifurcation. AdvApplProbab3(1):4477HeinJ, SchierupM, WiufC(2005)Genegenealogies, variationandevolution: aprimerincoalescenttheory.OxfordUniversityPress,OxfordHowson C,Urbach P(2005) Scientic reasoning: theBayesian approach, 3rd edn. OpenCourt, La SalleHuelsenbeckJP, KirkpatrickM(1996) Dophylogenetic methods producetrees withbiasedshapes?Evolution50(4):14181424HuelsenbeckJP, Ronquist F(2001) MRBAYES: Bayesianinferenceof phylogenetictrees. Bioinfor-matics(Oxford)17(8):754755HuelsenbeckJP, Ronquist F(2005) Bayesiananalysis of molecular evolutionusingMr. Bayes. In:NielsonR(ed)Statisticalmethodsinmolecularevolution. Springer,NewYork,pp183232HuelsenbeckJP, Ronquist F, NielsenR, BollbackJP(2001) Bayesianinferenceof phylogenyanditsimpactonevolutionarybiology.Science(WashingtonDC)294(5550):23102314JoyceJ(2005)Howprobabilitiesreectevidence.PhilosPerspect19:153Kadane JB (2006) Is objective Bayesian analysis objective, Bayesian, or wise? (comment on articles byBergerandbyGoldstein).BayesianAnal1(3):433436472 J.D.Velasco1 3KingmanJFC(1982)Thecoalescent. StochasticProcessAppl13(3):235248Kluge AG(2005) What is the rationale for Ockhams razor (a.k.a. parsimony) in phylogeneticinference? In: Albert VA(ed) Parsimony, phylogeny and genomics. Oxford University Press,Oxford,pp1542Larget B (2005) Introduction to Markov Chain Monte Carlo methods in molecular evolution. In: NielsonR(ed)Statisticalmethodsinmolecularevolution.Springer,NewYork,pp4461Larget B, Simon DL(1999) Markov chain Monte Carlo algorithms for the Bayesian analysis ofphylogenetictrees.MolBiolEvol16(6):750759Metzker ML, Mindell DP, Liu XM, Ptak RG, Gibbs RA, Hillis DM (2002) Molecular evidence of HIV-1transmissioninacriminalcase.ProcNatAcadSci99(22):1429214297MooersAO,HeardSB(1997)Inferringevolutionaryprocessfromphylogenetictreeshape.QRevBiol72(1):3154Pickett KM, RandleCP(2005) StrangeBayes indeed: uniformtopological priors implynon-uniformcladepriors.MolPhylogenetEvol34(1):203211PitnickS,JonesKE,Wilkinson GS(2006)Mating systemandbrainsizeinbats.ProcRSocBiolSciB273(1587):719724Poux C, Madsen O, Marquard E, Vieites DR, de Jong WW, Vences M (2005) Asynchronous colonizationofMadagascarbythefourendemiccladesofprimates,tenrecs,carnivores,androdentsasinferredfromnucleargenes.SystBiol54(5):719730RandleCP,MortME, CrawfordDJ(2005)Bayesianinferenceofphylogeneticsrevisited:developmentsandconcerns.Taxon54(1):915RannalaB, YangZ(1996) Probabilitydistributionof molecular evolutionarytrees: anewmethodofphylogeneticinference.JMolEvol43(3):304311SempleC,SteelM(2003).Phylogenetics. OxfordUniversityPress,OxfordSiddallME, KlugeAG(1997).Probabilismandphylogeneticinference.Cladistics13(4):313336SoberE(1988).Reconstructingthepast.Parsimony,evolution,andinference.MITPress,CambridgeSteel M, McKenzie A (2001). Properties of phylogenetic trees generated by yule-type speciation models.MathBiosci170(1):91112ThompsonEA(1975)Humanevolutionarytrees.CambridgeUniversityPress,CambridgeVelascoJD(2007)Whynon-uniformpriorsoncladesarebothunavoidableandunobjectionable. MolPhylogenetEvol45:748749YangZ(2006)Computationalmolecularevolution.OxfordUniversityPress,OxfordYangZ, RannalaB(1997). BayesianphylogeneticinferenceusingDNAsequences: aMarkovChainMonteCarlomethod.MolBiolEvol14(7):717724YuleGU(1925)Amathematicaltheoryofevolution, basedontheconclusionsofDr. JCWillis, FRS.PhilosTransRSocLondB213:2187Thepriorprobabilitiesofphylogenetictrees 4731 3