Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of...
Transcript of Edinburgh Research Explorer · 2018. 7. 13. · 2016a), provide insight into the tempo and mode of...
Edinburgh Research Explorer
Bayesian phylogenetic and phylodynamic data integration usingBEAST 110
Citation for published versionSuchard MA Lemey P Baele G Ayres DL Drummond AJ amp Rambaut A 2018 Bayesian phylogeneticand phylodynamic data integration using BEAST 110 Virus Evolution vol 4 no 1 vey016httpsdoiorg101093vevey016
Digital Object Identifier (DOI)101093vevey016
LinkLink to publication record in Edinburgh Research Explorer
Document VersionPublishers PDF also known as Version of record
Published InVirus Evolution
Publisher Rights StatementVC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License(httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited
General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights
Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation If you believe that the public display of this file breaches copyright pleasecontact openaccessedacuk providing details and we will remove access to the work immediately andinvestigate your claim
Download date 06 Jan 2021
Bayesian phylogenetic and phylodynamic data
integration using BEAST 110Marc A Suchard123dagger Philippe Lemey4Dagger Guy Baele4sect Daniel L Ayres5
Alexei J Drummond67 and Andrew Rambaut81Department of Biomathematics David Geffen School of Medicine University of California Los Angeles 621Charles E Young Dr South Los Angeles CA 90095 USA 2Department of Biostatistics Fielding School ofPublic Health University of California Los Angeles 650 Charles E Young Dr South Los Angeles CA 90095USA 3Department of Human Genetics David Geffen School of Medicine University of California Los Angeles695 Charles E Young Dr South Los Angeles CA 90095 USA 4Department of Microbiology and ImmunologyRega Institute KU Leuven Herestraat 49 3000 Leuven Belgium 5Center for Bioinformatics andComputational Biology University of Maryland College Park 125 Biomolecular Science Bldg 296 CollegePark MD 20742 USA 6Department of Computer Science University of Auckland 30338 Princes StAuckland 1010 NZ 7Centre for Computational Evolution University of Auckland 30338 Princes StAuckland 1010 NZ and 8Institute of Evolutionary Biology University of Edinburgh Ashworth LaboratoriesEdinburgh EH9 3FL UK
Corresponding author E-mail msucharduclaedu (MAS) alexeicsaucklandacnz (AJD) arambautedacuk (AR)daggerhttporcidorg0000-0001-9818-479X
Daggerhttporcidorg0000-0003-2826-5353
secthttporcidorg0000-0002-1915-7732
httporcidorg0000-0003-4337-3707
Abstract
The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesianphylogenetic and phylodynamic inference from genetic sequence data BEAST unifies molecular phylogenetic reconstruc-tion with complex discrete and continuous trait evolution divergence-time dating and coalescent demographic models inan efficient statistical inference engine using Markov chain Monte Carlo integration A convenient cross-platform graphicaluser interface allows the flexible construction of complex evolutionary analyses
Key words phylogenetics phylodynamics Bayesian inference Markov chain Monte Carlo
1 Introduction
First released over 14 years ago the Bayesian EvolutionaryAnalysis by Sampling Trees (BEAST) software package has be-come firmly established in a broad diversity of biological fieldsfrom phylogenetics and paleontology population dynamics
ancient DNA and the phylodynamics and molecular epidemiol-ogy of infectious disease (Drummond et al 2012) BEASTrsquos spe-cific focus on time-scaled trees and the evolutionary analysesdependent on them has given it a unique place in the toolboxof molecular evolution and phylogenetic researchers Since in-ception a strong motivation for BEAST development has been
VC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
1
Virus Evolution 2018 4(1) vey016
doi 101093vevey016Resources
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics
In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference
2 Data integration
Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)
Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)
Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations
of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)
Further extending the data integration approach BEAST 110
includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters
Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)
3 Flexible model design
BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well
as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution
Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution
2 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
4 Performance and efficiency
Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two
advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V
41 Example
Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities
Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)
trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-
graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors
for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-
rating case counts
M A Suchard et al | 3
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
5 Relationship to BEAST2 and other software
Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains
A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree
51 Availability
BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)
Acknowledgements
We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo
KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research
Conflict of interest None declared
ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30
Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]
Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3
Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67
amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805
amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64
Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005
Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914
Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023
amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9
Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537
Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969
Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73
Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J
4 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-
Bayesian phylogenetic and phylodynamic data
integration using BEAST 110Marc A Suchard123dagger Philippe Lemey4Dagger Guy Baele4sect Daniel L Ayres5
Alexei J Drummond67 and Andrew Rambaut81Department of Biomathematics David Geffen School of Medicine University of California Los Angeles 621Charles E Young Dr South Los Angeles CA 90095 USA 2Department of Biostatistics Fielding School ofPublic Health University of California Los Angeles 650 Charles E Young Dr South Los Angeles CA 90095USA 3Department of Human Genetics David Geffen School of Medicine University of California Los Angeles695 Charles E Young Dr South Los Angeles CA 90095 USA 4Department of Microbiology and ImmunologyRega Institute KU Leuven Herestraat 49 3000 Leuven Belgium 5Center for Bioinformatics andComputational Biology University of Maryland College Park 125 Biomolecular Science Bldg 296 CollegePark MD 20742 USA 6Department of Computer Science University of Auckland 30338 Princes StAuckland 1010 NZ 7Centre for Computational Evolution University of Auckland 30338 Princes StAuckland 1010 NZ and 8Institute of Evolutionary Biology University of Edinburgh Ashworth LaboratoriesEdinburgh EH9 3FL UK
Corresponding author E-mail msucharduclaedu (MAS) alexeicsaucklandacnz (AJD) arambautedacuk (AR)daggerhttporcidorg0000-0001-9818-479X
Daggerhttporcidorg0000-0003-2826-5353
secthttporcidorg0000-0002-1915-7732
httporcidorg0000-0003-4337-3707
Abstract
The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesianphylogenetic and phylodynamic inference from genetic sequence data BEAST unifies molecular phylogenetic reconstruc-tion with complex discrete and continuous trait evolution divergence-time dating and coalescent demographic models inan efficient statistical inference engine using Markov chain Monte Carlo integration A convenient cross-platform graphicaluser interface allows the flexible construction of complex evolutionary analyses
Key words phylogenetics phylodynamics Bayesian inference Markov chain Monte Carlo
1 Introduction
First released over 14 years ago the Bayesian EvolutionaryAnalysis by Sampling Trees (BEAST) software package has be-come firmly established in a broad diversity of biological fieldsfrom phylogenetics and paleontology population dynamics
ancient DNA and the phylodynamics and molecular epidemiol-ogy of infectious disease (Drummond et al 2012) BEASTrsquos spe-cific focus on time-scaled trees and the evolutionary analysesdependent on them has given it a unique place in the toolboxof molecular evolution and phylogenetic researchers Since in-ception a strong motivation for BEAST development has been
VC The Author(s) 2018 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited
1
Virus Evolution 2018 4(1) vey016
doi 101093vevey016Resources
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics
In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference
2 Data integration
Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)
Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)
Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations
of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)
Further extending the data integration approach BEAST 110
includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters
Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)
3 Flexible model design
BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well
as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution
Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution
2 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
4 Performance and efficiency
Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two
advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V
41 Example
Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities
Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)
trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-
graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors
for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-
rating case counts
M A Suchard et al | 3
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
5 Relationship to BEAST2 and other software
Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains
A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree
51 Availability
BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)
Acknowledgements
We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo
KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research
Conflict of interest None declared
ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30
Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]
Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3
Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67
amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805
amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64
Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005
Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914
Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023
amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9
Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537
Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969
Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73
Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J
4 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-
the rapid growth of pathogen genome sequencing as part ofpublic health responses to infectious diseases (Grenfell et al2004) In particular fast evolving viruses can now be tracked innear real-time (see eg Quick et al 2016) to understand their ep-idemiology and evolutionary dynamics
In BEAST version 110 we have introduced a series of advan-ces with a particular focus on delivering accurate and informa-tive insights for infectious disease research through theintegration of diverse data sources including phenotypic andepidemiological information with molecular evolutionarymodels These advances fall into three broad themesmdashthe inte-gration of diverse sources of extrinsic information as covariatesof evolutionary processes the increased flexibility and modula-rization of the model design process with robust and accuratemodel testing methods and substantial improvements on thespeed and efficiency of the statistical inference
2 Data integration
Many traits in phylogenetics are represented as or partitionedinto a finite number of discrete values with geographical loca-tion standing out as a popular example Because BEAST is dedi-cated to sampling time-scaled phylogenies new developmentsof discrete character mapping enable the reconstruction oftimed viral dispersal patterns while accommodating phyloge-netic uncertainty By extending the discrete diffusion models toincorporate empirical data as covariates or predictors of transi-tion rates BEAST can simultaneously test and quantify a rangeof potential predictive variables of the diffusion process (Lemeyet al 2014) Further realizations of the trait transition processcan also be efficiently produced to pinpoint the nature and tim-ing of changes in evolutionary history beyond ancestral nodestate reconstruction (termed Markov jumps) or to infer the timespent in a particular state (Markov rewards) (Minin and Suchard2008) For molecular data fast stochastic mapping approachesare also employed to obtain site-specific dN=dS estimates inte-grating over the posterior distribution of phylogenies and an-cestral reconstructions to quantify uncertainty on thesemeasures of the selective forces on individual codons (Lemeyet al 2012)
Multivariate continuous traits are incorporated using phylo-genetic Brownian diffusion processes modelling the shared an-cestral dependence across taxa and the correlations betweenthese variables Such continuous models have most frequentlybeen applied to diffusion on a geographical landscape with thetraits representing coordinates and the phylogeny reconstruct-ing the epidemiological process within the host population(Lemey et al 2010) The landscapes can also represent otherspaces and integration of antibody binding assay data have ex-tended lsquoantigenic cartographyrsquo (Smith et al 2004) approaches tomodel simultaneous antigenic and genetic evolution and inferthe viral trajectories in the immunological space generated bythe host population (Bedford et al 2014)
Standard Brownian diffusion processes that assume a zero-mean displacement along each branch may however be unreal-istic for many evolutionary problems (including geographicalreconstruction) A recently developed relaxed directional ran-dom walk allows the diffusion processes to take on different di-rectional trends in different parts of the phylogeny whilepreserving model identifiability (Gill et al 2017) and opens upthese processes for a wide range of applications BEAST 110also extends multivariate phylogenetic diffusion to latent liabil-ity model formulations in order to assess correlations betweentraits of different data types including (various combinations
of) continuous binary and discrete traits (Cybis et al 2015) asdemonstrated by applications to flower morphology antibioticresistance and viral epitope evolution To infer correlations be-tween high-dimensional traits computationally efficiently anovel phylogenetic factor analysis approach assumes that asmall unknown number of independent evolutionary factorsevolve along the phylogeny and generate clusters of dependenttraits at the tips (Tolkoff et al 2018)
Further extending the data integration approach BEAST 110
includes a flexible framework for incorporating time-varyingcovariates of the effective population size over time This usesGaussian Markov random fields to reconstruct smoothed effec-tive population size trajectories while simultaneously estimat-ing to what extent predictor variables (eg fluctuations inclimatic factors host mobility or vector density) may havedriven the dynamics (Gill et al 2016) Using a similar general-ized linear modeling (GLM) approach classical epidemiologicaltime-series data such as case counts (Gill et al 2016) can be inte-grated with pathogen genome sequence data to provide joint in-ference of important epidemiological parameters
Finally recent host-transmission models allow the integra-tion of complete or partial knowledge of a pathogenrsquos transmis-sion history enabling the simultaneous inference of within-host population dynamics viral evolutionary processes andtransmission times and bottlenecks (Vrancken et al 2014)Likewise other priors enable the reconstruction of transmissiontrees of infectious disease epidemics and outbreaks while ac-commodating phylogenetic uncertainty and employ a newlydesigned set of phylogenetic tree proposals that respect nodepartitions (Hall et al 2015)
3 Flexible model design
BEASTrsquos companion graphical user interface program BEAUtiallows the user to import data select models choose prior dis-tributions and specify the settings for both Bayesian inferenceand marginal likelihood estimation Our efforts on BEAUti 110have focused on allowing the user to easily link or unlink substi-tution clock and tree models across multiple partitions as well
as linking individual parameters to provide considerable adapt-ability in model design Additionally BEAUti can also group var-ious parameters in a hierarchical phylogenetic model prior(Suchard et al 2003) which allows parameters to take differentvalues but be linked by a common distribution the parametersof which can then be inferred For example flexible codonmodel parameterizations using hierarchical phylogenetic mod-els (Baele et al 2016b) and incorporating a range of potentialpredictive variables for substitution behaviour (Bielejec et al2016a) provide insight into the tempo and mode of pathogenevolution
Marginal likelihood estimation to compare models usingBayes factors has become common practice in Bayesian phylo-genetic inference BEAST 110 now features marginal likelihoodestimation (Baele et al 2012) using path sampling (Gelman andMeng 1998 Lartillot and Philippe 2006) and stepping-stone sam-pling (Xie et al 2011) as well as the recently developed general-ized stepping-stone sampling (Fan et al 2011 Baele et al 2016a)that offers increased accuracy and improved numerical stabilityby employing the concept of lsquoworking distributionsrsquo ie distri-butions with known normalizing constants and parameterizedusing samples from the posterior distribution
2 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
4 Performance and efficiency
Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two
advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V
41 Example
Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities
Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)
trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-
graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors
for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-
rating case counts
M A Suchard et al | 3
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
5 Relationship to BEAST2 and other software
Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains
A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree
51 Availability
BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)
Acknowledgements
We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo
KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research
Conflict of interest None declared
ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30
Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]
Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3
Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67
amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805
amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64
Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005
Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914
Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023
amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9
Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537
Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969
Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73
Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J
4 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-
4 Performance and efficiency
Increasing model complexity and sequence availability inmodern-day analyses have stretched the computationaldemands of Bayesian phylogenetic inference To improve effi-ciency for large-scale sequence data BEAST 110 uses theBEAGLE library (Ayres et al 2012) that provides access to mas-sive parallelization on a range of computing architecturesIn particular the combination of BEAST 110 with BEAGLE 30(Ayres et al under review) allows multiple data partitions to beparallelized across a single high-performance device (ie aGPGPU graphics board) allowing for the utilization of the full ca-pacity of these devices reducing the computational overheadsAs the complexity of phylogenetic model designs increase con-comitant with the surge in scale of genomic data updating onlya parameter associated with a single data partition limits theoccupation of the massively multicore devices To address thiswe have developed an adaptive multivariate transition kernelthat simultaneously updates parameters across all the parti-tioned data making more efficient use of available hardware(Baele et al 2017) Through a combination of these two
advances BEAST 110 can yield a sizeable increase in effectivelyindependent posterior samples per unit-time over previoussoftware versions For the example data described below wesee a 5- to 25-fold improvement depending on the model pa-rameter using an NVIDIA Titan V
41 Example
Figure 1 presents a spatiotemporal reconstruction of Ebola virusevolution and spread during the 2013ndash2016 West African epi-demic highlighting several aspects of phylodynamic data inte-gration The estimates are based on a large data set of 1610genomes that represent over 5 per cent of the known cases(Dudas et al 2017) Administrative regions (nfrac14 56) are includedas discrete sampling locations to estimate viral dispersalthrough time while testing the contribution of a set of potentialcovariates to the pattern of spread using a GLM parameteriza-tion of phylogeographic diffusion (Lemey et al 2014) This indi-cates for example the importance of population sizes andgeographic distance to explain viral dispersal intensities
Figure 1 Phylodynamic analysis of the 2013ndash2016 West African Ebola virus epidemic encompassing simultaneous estimation of sequence and discrete (geographic)
trait data with a GLM fitted to the discrete trait model in order to establish potential predictors of viral transition between locations Plotted are a snapshot of geo-
graphic spread using SpreaD3 (Bielejec et al 2016b) the maximum clade credibility tree the posterior estimates of the GLM coefficients for seven possible predictors
for Ebola virus spread (Bayes Factor support values of 3 20 and 150 are indicated by vertical lines) and the effective population size through time estimated by incorpo-
rating case counts
M A Suchard et al | 3
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
5 Relationship to BEAST2 and other software
Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains
A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree
51 Availability
BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)
Acknowledgements
We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo
KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research
Conflict of interest None declared
ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30
Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]
Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3
Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67
amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805
amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64
Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005
Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914
Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023
amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9
Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537
Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969
Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73
Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J
4 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-
5 Relationship to BEAST2 and other software
Distinct from BEAST 110 described here BEAST2 is an indepen-dent project (Bouckaert et al 2014) intended as a platform thatmore readily facilitates the development of packages of modelsand analyses by other researchers Although both projects sharemany of the same models and the underlying inference frame-work BEAST has increasingly focused on the analysis of rapidlyevolving pathogens and their evolution and epidemiology Weaffirm that BEAST will continue to be developed in parallel tothe BEAST2 While these projects share a recent common origineach now aims to foster complementary research domains
A range of other software focusing on phylodynamic analy-ses of fast-evolving pathogens has been described since the lastversion of BEAST was published Of particular note are LSD(To et al 2016) TreeDater (Volz and Frost 2017) and TreeTime(Sagulenko et al 2018) These programs use least-squares algo-rithms (LSD) or maximum likelihood inference (TreeDaterTreeTime) and provide rapid analysis on large data sets for asubset of the models that BEAST provides However the formerprogram implements very limited phylodynamic models andthe latter two programs require a phylogenetic tree inferred us-ing other software as input data conditioning parameter esti-mates on this single tree
51 Availability
BEAST 110 is open source under the GNU lesser general publiclicense and available at httpsbeast-devgithubiobeast-mcmcfor cross-platform compiled programs and httpsgithubcombeast-devbeast-mcmc for software development and sourcecode It requires Java version 16 or greater Documentationtutorials and help are available at httpbeastcommunity andmany users actively discuss BEAST usage and development inthe lsquobeast-usersrsquo GoogleGroup discussion group (httpgroupsgooglecomgroupbeast-users) We also host an expandingsuite of R toolsmdashdesigned for posterior analyses using BEAST(httpsgithubcombeast-devRBeast)
Acknowledgements
We would like to thank the many developers and contribu-tors to BEAST 110 including Alex Alekseyenko TrevorBedford Filip Bielejec Erik Bloomquist Luiz CarvalhoGabriela Cybis Gytis Dudas Roald Forsberg Mandev GillMatthew Hall Joseph Heled Sebastian Hoehna DeniseKuehnert Wai Lok Sibon Li Gerton Lunter SidneyMarkowitz Vladimir Minin Julia Palacios Michael DefoinPlatel Oliver Pybus Beth Shapiro Korbinian Strimmer MaxTolkoff Chieh-Hsi Wu and Walter Xie This work was sup-ported in part by the European Union Seventh FrameworkProgramme for research technological developmentand demonstration under Grant Agreement no 278433-PREDEMICS and no 725422-ReservoirDOCS TheVIROGENESIS project receives funding from the EuropeanUnionrsquos Horizon 2020 research and innovation programmeunder grant agreement No 634650 The Artic Networkreceives funding from the Wellcome Trust through project206298Z17Z MAS is partly supported by NSF grant DMS1264153 and NIH grants R01 HG006139 R01 AI107034 andU19 AI135995 PL acknowledges support by the SpecialResearch Fund KU Leuven (lsquoBijzonder Onderzoeksfondsrsquo
KU Leuven OT14115) and the Research FoundationmdashFlanders (lsquoFonds voor Wetenschappelijk OnderzoekmdashVlaanderenrsquo G066215N G0D5117N and G0B9317N) GBacknowledges support from the Interne Fondsen KULeuvenInternal Funds KU Leuven DLA is supported by NSFgrant DBI 1661443 We gratefully acknowledge support fromNVIDIA Corporation with the donation of parallel comput-ing resources used for this research
Conflict of interest None declared
ReferencesAyres D L Cummings M P et al lsquoUnder review BEAGLE 30
Improved Usability for a High-Performance Computing Libraryfor Statistical Phylogeneticsrsquo Systematic Biology [WorldCat]
Darling A Zwickl D J Beerli P Holder M T Lewis PO Huelsenbeck J P Ronquist F Swofford D L CummingsM P Rambaut A and Suchard M A (2012) lsquoBEAGLE AnApplication Programming Interface and High-PerformanceComputing Library for Statistical Phylogeneticsrsquo SystematicBiology 61 170ndash3
Baele G Lemey P Bedford T Rambaut A Suchard M A andAlekseyenko A V (2012) lsquoImproving the Accuracy ofDemographic and Molecular Clock Model Comparison WhileAccommodating Phylogenetic Uncertaintyrsquo Molecular Biologyand Evolution 29 2157ndash67
amp Rambaut A and Suchard M A (2017) lsquoAdaptiveMCMC in Bayesian Phylogenetics An Application to AnalyzingPartitioned Data in BEASTrsquo Bioinformatics 33 1798ndash805
amp and Suchard M A (2016a) lsquoGenealogical WorkingDistributions for Bayesian Model Testing with PhylogeneticUncertaintyrsquo Systematic Biology 65 250ndash64
Suchard M A Bielejec F and Lemey P (2016b) lsquoBayesianCodon Substitution Modeling to Identify Sources of PathogenEvolutionary Rate Variationrsquo Microbial Genomics 2 e00005
Bedford T Suchard M A Lemey P Dudas G Gregory VHay A J McCauley J W Russell C A Smith D J andRambaut A (2014) lsquoIntegrating Influenza Antigenic Dynamicswith Molecular Evolutionrsquo eLife 3 e01914
Bielejec F Baele G Rodrigo A G Suchard M A and Lemey P(2016a) lsquoIdentifying Predictors of Time-Inhomogeneous ViralEvolutionary Processesrsquo Virus Evolution 2 vew023
amp Vrancken B Suchard M A Rambaut A andLemey P (2016b) lsquoSpreaD3 Interactive Visualization ofSpatiotemporal History and Trait Evolutionary ProcessesrsquoMolecular Biology and Evolution 33 2167ndash9
Bouckaert R Heled J Kuhnert D Vaughan T Wu C-H XieD Suchard M A Rambaut A and Drummond A J (2014)lsquoBEAST 2 A Software Platform for Bayesian EvolutionaryAnalysisrsquo PLoS Computational Biology 10 e1003537
Cybis G B Sinsheimer J S Bedford T Mather A E Lemey Pand Suchard M A (2015) lsquoAssessing Phenotypic Correlationthrough the Multivariate Phylogenetic Latent Liability ModelrsquoThe Annals of Applied Statistics 9 969
Drummond A J Suchard M A Xie D and Rambaut A (2012)lsquoBayesian Phylogenetics with BEAUti and the BEAST 17rsquoMolecular Biology and Evolution 29 1969ndash73
Dudas G Carvalho L M Bedford T Tatem A J Baele GFaria N R Park D J Ladner J T Arias A Asogun DBielejec F Caddy S L Cotten M DrsquoAmbrozio J DellicourS Caro A D Diclaro J D II Durrafour S Elmore M J
4 | Virus Evolution 2018 Vol 4 No 1
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-
Fakoli L S III Faye O Gilbert M L Gevao S M Gire SGladden-Young A Gnirke A Goba A Grant D SHaagmans B L Hiscox J A Jah U Kargbo B Kugelman JR Liu D Lu J Malboeuf C M Mate S Matthews D AMatranga C B Meredith L W Qu J Quick J Pas S DPhan M V T Pollakis G Reusken C B Sanchez-LockhartM Schaffner S F Schieffelin J S Sealfon R S Simon-Loriere E Smits S L Stoecker K Thorne L Tobin E AVandi M A Watson S J West K Whitmer S Wiley M RWinnicki S M Wohl S Wolfel R Yozwiak N L AndersenK G Blyden S O Bolay F Carroll M W Dahn B Diallo BFormenty P Fraser C Gao G F Garry R F Goodfellow IGunther S Happi C T Holmes E C Keıta S Kellam PKoopmans M P G Kuhn J H Loman N J Magassouba NNaidoo D Nichol S T Nyenswah T Palacios G Pybus OG Sabeti P C Sall A Stroher U Wurie I Suchard M ALemey P and Rambaut A (2017) lsquoVirus Genomes RevealFactors That Spread and Sustained the Ebola EpidemicrsquoNature 544 309ndash15
Fan Y Wu R Chen M H Kuo L and Lewis P O (2011)lsquoChoosing among Partition Models in Bayesian PhylogeneticsrsquoMolecular Biology and Evolution 28 523ndash32
Gelman A and Meng X-L (1998) lsquoSimulating NormalizingConstants From Importance Sampling to Bridge Sampling toPath Samplingrsquo Statistical Science 13 163ndash85
Gill M S Ho T Si L Baele G Lemey P and Suchard M A(2017) lsquoA Relaxed Directional Random Walk Model forPhylogenetic Trait Evolutionrsquo Systematic Biology 66 299ndash319
Lemey P Bennett S N Biek R and Suchard M A (2016)lsquoUnderstanding past Population Dynamics BayesianCoalescent-Based Modeling with Covariatesrsquo SystematicBiology 65 1041ndash56
Grenfell B T Pybus O G Gog J R Wood J L N Daly J MMumford J A and Holmes E C (2004) lsquoUnifying theEpidemiological and Evolutionary Dynamics of PathogensrsquoScience 303 327ndash32
Hall M Woolhouse M and Rambaut A (2015) lsquoEpidemicReconstruction in a Phylogenetics Framework TransmissionTrees as Partitions of the Node Setrsquo PLoS Computational Biology11 e1004613
Lartillot N and Philippe H (2006) lsquoComputing Bayes FactorsUsing Thermodynamic Integrationrsquo Systematic Biology 55195ndash207
Lemey P Minin V N Bielejec F Pond S L K andSuchard M A (2012) lsquoA Counting Renaissance CombiningStochastic Mapping and Empirical Bayes to Quickly Detect
Amino Acid Sites under Positive Selectionrsquo Bioinformatics28 3248ndash56
Rambaut A Bedford T Faria N Bielejec F Baele GRussell C A Smith D J Pybus O G Brockmann D et al(2014) lsquoUnifying Viral Genetics and Human TransportationData to Predict the Global Transmission Dynamics of HumanInfluenza H3N2rsquo PLoS Pathogens 10 e1003932
amp Welch J and Suchard M (2010) lsquoPhylogeographyTakes a Relaxed Random Walk in Continuous Space andTimersquo Molecular Biology and Evolution 27 1877ndash85
Minin V N and Suchard M A (2008) lsquoFast Accurate andSimulation-Free Stochastic Mappingrsquo Philosophical Transactionsof the Royal Society of London Series B Biological Sciences 3633985ndash95
Quick J Loman N Duraffour S Simpson J et al (2016)lsquoReal-Time Portable Genome Sequencing for EbolaSurveillancersquo Nature 530 228ndash32
Sagulenko P Puller V and Neher R A (2018) lsquoTreetimeMaximum-Likelihood Phylodynamic Analysisrsquo Virus Evolution4 vex042
Smith D J Lapedes A S de Jong J C Bestebroer T MRimmelzwaan G F Osterhaus A D M E and Fouchier R AM (2004) lsquoMapping the Antigenic and Genetic Evolution ofInfluenza Virusrsquo Science 305 371ndash6
Suchard M A Kitchen C M R Sinsheimer J S and WeissR E (2003) lsquoHierarchical Phylogenetic Models forAnalyzing Multipartite Sequence Datarsquo Systematic Biology52 649ndash64
To T-H Jung M Lycett S and Gascuel O (2016) lsquoFast DatingUsing Least-Squares Criteria and Algorithmsrsquo SystematicBiology 65 82ndash97
Tolkoff M R Alfaro M E Baele G Lemey P and SuchardM A 2018 lsquoPhylogenetic Factor Analysisrsquo Systematic Biology67 384ndash99
Volz E and Frost S (2017) lsquoScalable Relaxed Clock PhylogeneticDatingrsquo Virus Evolution 3 vex025
Vrancken B Rambaut A Suchard M A Drummond A BaeleG Derdelinckx I Van Wijngaerden E Vandamme A-MVan Laethem K and Lemey P (2014) lsquoThe GenealogicalPopulation Dynamics of HIV-1 in a Large Transmission ChainBridging within and among Host Evolutionary Ratesrsquo PLoSComputational Biology 10 e1003505
Xie W Lewis P O Fan Y Kuo L and Chen M H (2011)lsquoImproving Marginal Likelihood Estimation for BayesianPhylogenetic Model Selectionrsquo Systematic Biology 60 150ndash60
M A Suchard et al | 5
Downloaded from httpsacademicoupcomvearticle-abstract41vey0165035211by Edinburgh University useron 13 July 2018
- l
-