Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter...

7
Supporting Information Estep et al. 10.1073/pnas.1404177111 SI Materials and Methods Taxon Sampling. We sampled 114 accessions representing two outgroup species (Paspalum malacophyllum and Plagiantha tenella), two species of Arundinella, and 100 species of Andropogoneae in 40 genera. Plant material came from our own collections, the US Department of Agriculture (USDA) Germplasm Resources In- formation Network (GRIN), the Kew Millenium Seed Bank, and material sent by colleagues. All plants acquired as seeds (e.g., those from USDA and from Kew) were grown to flowering in the greenhouse at the University of Missouri-St. Louis to verify iden- tification. Vouchers are listed in Table S1. Sequencing and Processing. Total genomic DNA was extracted using a modified cetyl triethylammonium bromide (CTAB) procedure (1) or Qiagen DNeasy kits, following the manu- facturers protocol (Qiagen) (2). Five regions of four loci were PCR amplified following Estep et al. (3). The loci are Aberrant panicle organization1 (apo1), Dwarf8(d8), two exons of Erect panicle2 (ep2), and Retarded palea1 (rep1). In sorghum (a dip- loid) apo1 and d8 are on chromosomes 1 and 10, respectively. ep2 and rep1 are on chromosome 2, 14.5 Mbp apart; based on estimated recombination frequency in euchromatic regions, this distance could be ca. 50 cM (4). Thus, the four markers are unlinked. Each marker has homologs on two chromosomes in maize, as expected in a tetraploid. Importantly for this study, we found no evidence that these loci are lost after polyploidization; in known polyploids, paralogues are found consistently as long as sequence depth is sufficient. PCR products were gel purified, cloned using pGEM-T easy kits, and transformed into JM109 high-efficiency competent cells, following the manufacturersprotocols (Promega) or using a Topo-4 cloning vector and transformed into One Shot Top 10 cells (Invitrogen). At least eight positive clones for each PCR product were sequenced in both directions using universal pri- mers (M13, Sp6, or T7) on an ABI3730 DNA sequencer at the Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually or using Geneious Pro-5.5.6 (BioMatters), and reverse and forward sequences for each clone were assembled. Only clones with 80% or more double-stranded sequence were used for downstream analyses. Internal primers were designed as necessary for loci over 1,000 bp long (D8 and Ep2 exon7). All good quality contigs for each sample were then aligned using Geneious, and primer sequences were removed. Recombinants were initially identified by eye and then con- firmed with networks in SplitsTree (5), and removed from the alignment. Use of computational methods alone missed many recombinant sequences that were readily identifiable by eye. Occasionally, it was difficult to determine which sequences were genomic and which were PCR recombinants; in these cases, we compared the problematic sequences with unambiguous se- quences from other species to determine the nonrecombined sequences. Singletons were identified with MEGA (6) and were interpreted as PCR errors. Sequences were translated and aligned using MUSCLE, as implemented in Geneious Pro-5.5.6. Data Matrix Assembly. The dataset for each locus consisted of numerous redundant clones. To reduce the number of sequences to one per paralogue per locus, preliminary phylogenetic analyses were conducted for each marker in RAxML (7) including all clones for all taxa. Clones that formed a clade in preliminary analyses and that differed by fewer than five nucleotides were inferred to represent a single locus and were combined into a majority-rule consensus sequence. Clones that did not meet these criteria were kept separate through another round of RAxML analyses. We identified clades with a bootstrap value 50 that comprised the same accession for each locus. Accessions in these clades were reduced to a single majority-rule consensus sequence using the perl script clone_reducer (github.com/ mrmckain). Gene trees were estimated using RAxML v.7.3.0 with the GTR + Γ model and 500 bootstrap replicates for each locus. We used individual gene-tree topologies as a guide to identify and concatenate paralogues from the same genome for each acces- sion. If a polyploid had two paralogues in each gene tree, and one paralogue was always sister to a particular diploid or other poly- ploid, then we inferred that those paralogues represented the same genome and used them to create a concatenated sequence. If a locus did not have a sequence congruent to the topology, then the locus was marked as missing data. Five datasets were assem- bled. One included only accessions with full sampling of all loci for all genomes, a second included accessions that had four out of five markers, a third, three out of five, and so forth. Preliminary trees for all datasets were congruent, but those trees in which some genomes of some taxa were represented by only one or two se- quences were less well supported. The results presented here are based on the dataset with a minimum of three out of five loci for each taxon, which maximized species inclusion while still providing enough information for robust results. All five datasets are depos- ited at Dryad (datadryad.org/resource/doi:10.5061/dryad.5kf38). Phylogenetic Analysis, Divergence Time, and Diversification Estimates. Concatenated trees were reconstructed using both maximum likelihood (ML) and Bayesian approaches and rooted at Paspalum. The ML tree was estimated with RAxML using the GTR + Γ model and 500 bootstrap replicates. The Bayesian tree was esti- mated using MrBayes v.3.2.1 (8) using a gamma model with six discrete categories. Two independent runs with 50 million gen- erations each were sampled every 1,000 generations. Convergence of the separate runs was verified using AWTY (9). Divergence times were estimated using BEAST 1.7.5 (10) on the CIPRES Science Gateway (11). The concatenated analysis was run for 100 million generations sampling every 1,000 under the GTR + Γ model with six gamma categories. The tree prior used the birth-death with incomplete sampling model (12), with the starting tree being estimated using unweighted pair group method with arithmetic mean (UPGMA). The site model fol- lowed an uncorrelated lognormal relaxed clock (13). The anal- ysis was rooted to Paspalum, with the age of the root estimated as a normal distribution describing an age of 25.5 ± 5 million y (14). Convergence statistics were estimated using Tracer v.1.5 (15) after a burn-in of 50,000 sampled generations. Chain con- vergence was estimated to have been met when the effective sample size was greater than 200 for all statistics. Ultimately, 10,000 trees were used in TreeAnnotator v.1.7.5 to produce the maximum clade credibility tree and to determine the 95% highest posterior density (HPD) for each node. The final tree was drawn using FigTree v.1.4.0 (tree.bio.ed.ac.uk/software/ figtree/). Tests for shifts in the underlying model of diversification were conducted using Bayesian Analysis of Macroevolutionary Mix- tures (BAMM) (16); priors were chosen as recommended using the function setBAMMpriors. Analyses were conducted on a tree constructed by pruning the BEAST tree to leave only a single Estep et al. www.pnas.org/cgi/content/short/1404177111 1 of 7 Supporting Information Corrected , Dec mber e 28 2015

Transcript of Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter...

Page 1: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

Supporting InformationEstep et al. 10.1073/pnas.1404177111SI Materials and MethodsTaxon Sampling. We sampled 114 accessions representing twooutgroup species (Paspalum malacophyllum and Plagiantha tenella),two species of Arundinella, and 100 species of Andropogoneae in40 genera. Plant material came from our own collections, the USDepartment of Agriculture (USDA) Germplasm Resources In-formation Network (GRIN), the Kew Millenium Seed Bank, andmaterial sent by colleagues. All plants acquired as seeds (e.g., thosefrom USDA and from Kew) were grown to flowering in thegreenhouse at the University of Missouri-St. Louis to verify iden-tification. Vouchers are listed in Table S1.

Sequencing and Processing. Total genomic DNA was extractedusing a modified cetyl triethylammonium bromide (CTAB)procedure (1) or Qiagen DNeasy kits, following the manu-facturer’s protocol (Qiagen) (2). Five regions of four loci werePCR amplified following Estep et al. (3). The loci are Aberrantpanicle organization1 (apo1), Dwarf8 (d8), two exons of Erectpanicle2 (ep2), and Retarded palea1 (rep1). In sorghum (a dip-loid) apo1 and d8 are on chromosomes 1 and 10, respectively.ep2 and rep1 are on chromosome 2, 14.5 Mbp apart; based onestimated recombination frequency in euchromatic regions, thisdistance could be ca. 50 cM (4). Thus, the four markers areunlinked. Each marker has homologs on two chromosomes inmaize, as expected in a tetraploid. Importantly for this study, wefound no evidence that these loci are lost after polyploidization;in known polyploids, paralogues are found consistently as long assequence depth is sufficient.PCR products were gel purified, cloned using pGEM-T easy

kits, and transformed into JM109 high-efficiency competent cells,following the manufacturers’ protocols (Promega) or usinga Topo-4 cloning vector and transformed into One Shot Top 10cells (Invitrogen). At least eight positive clones for each PCRproduct were sequenced in both directions using universal pri-mers (M13, Sp6, or T7) on an ABI3730 DNA sequencer at thePenn State Huck Institute of the Life Sciences or BeckmanCoulter Genomics. Chromatogram files were trimmed of vectorand low quality sequences manually or using Geneious Pro-5.5.6(BioMatters), and reverse and forward sequences for each clonewere assembled. Only clones with 80% or more double-strandedsequence were used for downstream analyses. Internal primerswere designed as necessary for loci over 1,000 bp long (D8 andEp2 exon7). All good quality contigs for each sample were thenaligned using Geneious, and primer sequences were removed.Recombinants were initially identified by eye and then con-firmed with networks in SplitsTree (5), and removed from thealignment. Use of computational methods alone missed manyrecombinant sequences that were readily identifiable by eye.Occasionally, it was difficult to determine which sequences weregenomic and which were PCR recombinants; in these cases, wecompared the problematic sequences with unambiguous se-quences from other species to determine the nonrecombinedsequences. Singletons were identified with MEGA (6) and wereinterpreted as PCR errors. Sequences were translated andaligned using MUSCLE, as implemented in Geneious Pro-5.5.6.

Data Matrix Assembly. The dataset for each locus consisted ofnumerous redundant clones. To reduce the number of sequencesto one per paralogue per locus, preliminary phylogenetic analyseswere conducted for each marker in RAxML (7) including allclones for all taxa. Clones that formed a clade in preliminaryanalyses and that differed by fewer than five nucleotides were

inferred to represent a single locus and were combined intoa majority-rule consensus sequence. Clones that did not meetthese criteria were kept separate through another round ofRAxML analyses. We identified clades with a bootstrap value ≥50 that comprised the same accession for each locus. Accessionsin these clades were reduced to a single majority-rule consensussequence using the perl script clone_reducer (github.com/mrmckain).Gene trees were estimated using RAxML v.7.3.0 with the

GTR + Γ model and 500 bootstrap replicates for each locus.We used individual gene-tree topologies as a guide to identify andconcatenate paralogues from the same genome for each acces-sion. If a polyploid had two paralogues in each gene tree, and oneparalogue was always sister to a particular diploid or other poly-ploid, then we inferred that those paralogues represented thesame genome and used them to create a concatenated sequence.If a locus did not have a sequence congruent to the topology, thenthe locus was marked as missing data. Five datasets were assem-bled. One included only accessions with full sampling of all loci forall genomes, a second included accessions that had four out of fivemarkers, a third, three out of five, and so forth. Preliminary treesfor all datasets were congruent, but those trees in which somegenomes of some taxa were represented by only one or two se-quences were less well supported. The results presented here arebased on the dataset with a minimum of three out of five loci foreach taxon, which maximized species inclusion while still providingenough information for robust results. All five datasets are depos-ited at Dryad (datadryad.org/resource/doi:10.5061/dryad.5kf38).

Phylogenetic Analysis, Divergence Time, and Diversification Estimates.Concatenated trees were reconstructed using both maximumlikelihood (ML) and Bayesian approaches and rooted at Paspalum.The ML tree was estimated with RAxML using the GTR + Γmodel and 500 bootstrap replicates. The Bayesian tree was esti-mated using MrBayes v.3.2.1 (8) using a gamma model with sixdiscrete categories. Two independent runs with 50 million gen-erations each were sampled every 1,000 generations. Convergenceof the separate runs was verified using AWTY (9).Divergence times were estimated using BEAST 1.7.5 (10) on

the CIPRES Science Gateway (11). The concatenated analysiswas run for 100 million generations sampling every 1,000 underthe GTR + Γ model with six gamma categories. The tree priorused the birth-death with incomplete sampling model (12), withthe starting tree being estimated using unweighted pair groupmethod with arithmetic mean (UPGMA). The site model fol-lowed an uncorrelated lognormal relaxed clock (13). The anal-ysis was rooted to Paspalum, with the age of the root estimatedas a normal distribution describing an age of 25.5 ± 5 million y(14). Convergence statistics were estimated using Tracer v.1.5(15) after a burn-in of 50,000 sampled generations. Chain con-vergence was estimated to have been met when the effectivesample size was greater than 200 for all statistics. Ultimately,10,000 trees were used in TreeAnnotator v.1.7.5 to producethe maximum clade credibility tree and to determine the95% highest posterior density (HPD) for each node. The finaltree was drawn using FigTree v.1.4.0 (tree.bio.ed.ac.uk/software/figtree/).Tests for shifts in the underlying model of diversification were

conducted using Bayesian Analysis of Macroevolutionary Mix-tures (BAMM) (16); priors were chosen as recommended usingthe function setBAMMpriors. Analyses were conducted on a treeconstructed by pruning the BEAST tree to leave only a single

Estep et al. www.pnas.org/cgi/content/short/1404177111 1 of 7

Supporting Information Corrected , Dec mbere 28 2015

Page 2: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

paralogue (genome) per species, as well as on the unpruned trees.Analyses were run for 10 million and 100 million generations;10 million provided an adequate effective sample size for bothrate shifts and log-likelihoods (>>500), and additional generationsdid not change the results. Incomplete taxon sampling was ac-commodated by assigning a backbone sampling fraction of 0.5,corresponding to our sample of 50% of the genera, and assigningspecific fractions for the sample for each clade. Preliminaryanalyses using a backbone sample of 0.08 or using a global esti-mate of sampling intensity produced largely similar results.Differences in speciation rate for polyploids versus diploids

were estimated using the Binary State Speciation and Extinction(BiSSE) model (17) as implemented in Mesquite (18) and di-versitree (19). Species were coded as allopolyploid or diploid,and the characters were mapped on the pruned BEAST tree.Adjustment for incomplete sampling was implemented in di-versitree, using a sample of 0.08 and also 0.5 to correspond to theanalyses in BAMM. Analyses were run for 100,000 generations.Values from an unconstrained analysis were compared withthose of an analysis in which speciation rates were constrained tobe equal.

Genome-Size Estimation. Genome size was measured by flowcytometry at the Flow Cytometry and Imaging Core laboratoryat the Benaroya Research Institute, Virginia Mason ResearchCenter. Genome sizes were measured a minimum of four times

using standard methods, and the values were averaged (20). Pi-cograms of DNA/2C cell were converted to megabasepairs(Mbp)/1C by multiplying the average experimental value by980 Mbp per 1 picogram of DNA and dividing by two.

Estimating the Number of Allopolyploidization Events. To obtaina minimum estimate of the number of allopolyploidy events, welooked for a clear phylogenetic signal of allopolyploidy, in theform of a multiple-labeled gene tree (Fig. S1). This methodprovides unambiguous evidence of allopolyploidy, but willoverlook genetic and taxonomic autopolyploidy, allopolyploidybetween two very similar species or populations of the samespecies, and allopolyploidy for which our sequencing was notsufficiently deep to retrieve paralogues. Thus, the number ofevents estimated here is a robust minimum estimate. We alsoreport genome size for many of the sequenced specimens (Figs. S2and S3). Although genome size gives a rough approximation ofploidy, it cannot distinguish between polyploids and plants whosegenomes have expanded via transposon amplification (21). Wedid not use published data on chromosome numbers because theliterature on these numbers is unreliable; many counts lackphotodocumentation or voucher specimens and thus cannot beverified as to number or species identity. In addition, manyspecies have several numbers reported; these may represent realvariation or errors.

1. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of freshleaf tissue. Phytochem Bull 19:11–15.

2. Teerawatananon A, Jacobs SWL, Hodkinson TR (2011) Phylogenetics of Panicoideae(Poaceae) based on chloroplast and nuclear DNA sequences. Telopea (Syd) 13:115–142.

3. Estep MC, Vela Diaz DM, Zhong J, Kellogg EA (2012) Eleven diverse nuclear-encodedphylogenetic markers for the subfamily Panicoideae (Poaceae). Am J Bot 99(11):e443–e446.

4. Mace ES, Jordan DR (2011) Integrating sorghumwhole genome sequence informationwith a compendium of sorghum QTL studies reveals uneven distribution of QTL andof gene-rich regions with significant implications for crop improvement. Theor ApplGenet 123(1):169–191.

5. Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionarystudies. Mol Biol Evol 23(2):254–267.

6. Tamura K, et al. (2011) MEGA5: Molecular evolutionary genetics analysis using max-imum likelihood, evolutionary distance, and maximum parsimony methods. Mol BiolEvol 28(10):2731–2739.

7. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analy-ses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690.

8. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference undermixed models. Bioinformatics 19(12):1572–1574.

9. Nylander JAA, Wilgenbusch JC, Warren DL, Swofford DL (2008) AWTY (are we thereyet?): A system for graphical exploration of MCMC convergence in Bayesian phylo-genetics. Bioinformatics 24(4):581–583.

10. Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics withBEAUti and the BEAST 1.7. Mol Biol Evol 29(8):1969–1973.

11. Miller MA, Pfeiffer W, Schwartz T (2010) Creating the CIPRES science gateway forinference of large phylogenetic trees. Proceedings of the 2010 Gateway ComputingEnvironments Workshop (GCE) (Institute of Electrical and Electronics Engineers, NewYork), pp 1–8.

12. Stadler T (2009) On incomplete sampling under birth-death models and connectionsto the sampling-based coalescent. J Theor Biol 261(1):58–66.

13. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics anddating with confidence. PLoS Biol 4(5):e88.

14. Vicentini A, Barber JC, Giussani LM, Aliscioni SS, Kellogg EA (2008) Multiple coincidentorigins of C4 photosynthesis in the Mid- to Late Miocene. Glob Change Biol 14:2963–2977.

15. Rambaut A, Suchard MA, Xie D, Drummond AJ (2013) Tracer, Version 1.5. Available attree.bio.ed.ac.uk/software/tracer/.

16. Rabosky DL (2014) Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS ONE 9(2):e89543.

17. Maddison WP, Midford PE, Otto SP (2007) Estimating a binary character’s effect onspeciation and extinction. Syst Biol 56(5):701–710.

18. Maddison WP, Maddison DR (2011) Mesquite: A Modular System for EvolutionaryAnalysis, Version 2.75. Available at mesquiteproject.org.

19. FitzJohn RG (2012) Diversitree: Comparative phylogenetic analysis of diversification inR. Methods Ecol. Evol. 3:1084–1092.

20. Arumuganathan K, Earle ED (1991) Estimation of nuclear DNA amounts of plants byflow cytometry. Plant Mol Biol Rep 9:229–241.

21. Kellogg EA, Bennetzen JL (2004) The evolution of nuclear genome structure in seedplants. Am J Bot 91(10):1709–1725.

Estep et al. www.pnas.org/cgi/content/short/1404177111 2 of 7

Page 3: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

ABCDE

diploids, divergent evolution

AE

tetraploid

Evolutionary history Gene tree, full sampling

ABCDE

AE

AE

Examples of gene trees, incomplete sampling

BE

AE

AE

AE1

AE4AE3AE2

ABCDE

AE1

AE4AE3AE2

AE1

AE4AE3AE2

BAE

AE

AE1

AE4AE1AE4

BCE

AE1

AE4AE3

AE1

AE4AE2

No

dive

rsifi

catio

n a

fter

pol

yplo

idy

Div

ersi

ficat

ion

afte

r pol

yplo

idy

ABCDE

hybridization

Fig. S1. Gene-tree patterns resulting from hybridization and polyploidy (genetic allopolyploidy). (Left) The actual evolutionary history of divergent evolutionat the diploid level, followed by hybridization between taxon A and taxon E (reticulation) and formation of a tetraploid. Subsequent speciation may (Lower) ormay not (Upper) occur. (Center) The double-labeled gene trees that will be produced by each pattern of evolution. Tetraploids appear twice in the tree; the Agenome (AE) will appear with its A genome progenitor and the E genome (AE) with its E genome progenitor. (Right) Some of the many possible gene treesthat will result from incomplete taxon sampling. In this study, any of these patterns were taken as indication of allopolyploidy.

Estep et al. www.pnas.org/cgi/content/short/1404177111 3 of 7

Page 4: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

posterior probabtility

1

0.5163

0.03

Pseudosorghum_fasciculare_164

Ischaemum_afrum_020

Phacelurus_digitatus_025

Chrysopogon_serrulatus_056

Arundinella_nepalensis_115

Rottboellia_cochinchinensis_046

Dimeria_fuscescens_137

Sorghum_bicolor_1_037

Imperata_cylindrica_154

Polytrias_indica_073

Microstegium_vimineum_124

Dimeria_sinensis_139

Apocopis_courtllumensis_127

Saccharum_narenga_166

Dimeria_fuscescens_137

Ischaemum_santapaui_021

Coix_gasteenii_134

Tripsacum_dactyloides_041

Kerriochloa_siamensis_157

Pogonatherum_paniceum_072

Apluda_mutica_100

Miscanthus_ecklonii_089

Ischaemum_santapaui_021

Erianthus_arundinaceum_165

Arundinella_hirta-099

Apocopis_siamensis_129

Dimeria_ornithopoda_138

Sorghum_nervosum_039

Arthraxon_lanceolatus_131

Chrysopogon_zizanoides_104

Apocopis_intermedius_128

Polytrias_indica_073

Eulalia_aurea_098

Tripsacum_dactyloides_041

Sorghastrum_nutans_035

Miscanthus_ecklonii_089

Apocopis_intermedius_128

Sorghum_bicolor_182

Coix_lacryma-jobi_053

Eriochrysis_pallida_016

Chrysopogon_gryllus_006

Pogonatherum_crinitum_160

Germainia_capitata_145

Sorghastrum_elliotti_187

Chasmopodium_caudatum_120

Plagiantha_tenella_060

Paspalum_malacophyllum_058

Zea_maysA_997

Erianthus_ravennae_110

Spodiopogon_sibiricus_169

Miscanthus_sinensis_068

Thelepogon_elegans_170

Zea_maysB_998

Chasmopodium_caudatum_120

Ischaemum_muticum_156

Sorghum_bicolor_g_999

Ischaemum_rugosum_186

Chionachne_massiei_133

Sorghum_bicolor_2_038

Ischaemum_indicum_155

Andropterum_stolzii_050

Arthraxon_prionodes_103

Miscanthus_sinensis_068

Phacelurus_huillensis_024

Saccharum_narenga_166

Sehima_nervosum_034

Chrysopogon_gryllus_006

0.96 / 36

0.67 / 12

1 / 100

0.99 / 94

1/100

0.86 / 42

1 / 961 / 98

1/100

0.82 / 291 / 71

1 / 82

0.95 / 62

0.92 / 36

1/100

0.83 / 16

0.98 / 59

1/99

0.73 / 18

1 / 100

0.72 / 17

1 / 32

0.81 / 26

0.58 / 25

1.0 / 73

0.99 / 54

0.85 / 23

1/100

0.88 / 480.95 / 43

1/100

1/100

1

0.79 / 34

1 / 100

1 / 71

0.94 / 66

0.90 / 64

0.92 / 42

0.99 / 89

1/100

0.99 / 64

0.83 / 29

0.98 / 71

1/100

core Andropogoneae(Andropogoninae) (Figure S2)

1

2

3

4

5

6

7

955

3449

1878

677

1805

2597

960

717

2171

1760

2819

1751

2185

1058

1975

1675

1563

1161

8

1 / 100 1 / 100

1 / 100

1 / 100

1 / 100

1 / 1001 / 100

1 / 100

1/100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 1001 / 100

1 / 100

1 / 100

Fig. S2. Bayesian phylogeny of Andropogoneae, species not included in the “core Andropogoneae.” Numbers on branches are posterior probability (pp)/ML.Branch color reflects pp values. Genome-size estimates as Mbp/1C nucleus are indicated after species names, where available. Numbered polyploidizationevents correspond to those in Fig. 1, Figs. S2 and S3, and Table 1. Accession numbers follow the species name for all accessions.

Estep et al. www.pnas.org/cgi/content/short/1404177111 4 of 7

Page 5: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

posteriorprobability

1

0.5163

0.03

Andropogon_gerardii_002

Diheteropogon_hagerupii_183

Themeda_arundinacea_171

Bothriochloa_barbinodis_078

Bothriochloa_ischaemum_063

Bothriochloa_bladhii_093

Bothriochloa_bladhii_076

Hyparrhenia_hirta_189

Bothriochloa_decipiens_086

Dichanthium_annulatum_054

Bothriochloa_insculpta

Bothriochloa_bladhii_082

Hyparrhenia_diplandra_152

Bothriochloa_ischaemum_063

Bothriochloa_hybrida_092

Hyparrhenia_rufa_108

Hyparrhenia_hirta_189

Themeda_villosa_172

Capillipedium_spicigerum_090

Bothriochloa_laguroides_085

Bothriochloa_insculpta

Bothriochloa_woodrovii_083

Schizachyrium_sanguineum_var_hirtiflorum_007

Bothriochloa_bladhii_093

Bothriochloa_exaristata_094

Schizachyrium_sanguineum_032

Schizachyrium_exile_45

Cymbopogon_refractus_012

Capillipedium_parviflorum_005

Dichanthium_annulatum_013

Cymbopogon_parkeri_011

Eulalia_villosa_047Eulalia_quadrinervis_142

Bothriochloa_hybrida_092

Andropogon_perligulatus_003

Dichanthium annulatum_188

Cymbopogon_flexuosus_009

Themeda_triandra_116

Bothriochloa_woodrovii_083

Diheteropogon_amplectens_055

Bothriochloa_exaristata_094

Bothriochloa_insculpta

Bothriochloa_bladhii_082

Bothriochloa_ischaemum_102

Cymbopogon_citratus_135Cymbopogon_parkeri_011

Hyparrhenia_hirta_095

Dichanthium_annulatum_054

Hyparrhenia_rufa_108

Themeda_triandra_116

Hyparrhenia_hirta_095

Cymbopogon_distans_008

Bothriochloa_barbinodis_078

Eulalia_quadrinervis_142

Bothriochloa_hybrida_092

Bothriochloa_macra_079

Themeda_triandra_101

Iseilema_macratherum_061

Bothriochloa_bladhii_087

Schizachyrium_sanguineum_168

Bothriochloa_laguroides_085

Schizachyrium_sanguineum_032

Phacelurus_digitatus

Schizachyrium_sanguineum_var_hirtiflorum_007

Bothriochloa_exaristata_094

Hyparrhenia_hirta_018

Andropogon_glomeratus_192

Hyparrhenia_rufa_153

Hyparrhenia_hirta_018

Dichanthium_aristatum_014

Hyparrhenia_hirta_189

Hyparrhenia_diplandra_152

Bothriochloa_macra_079

Capillipedium_parviflorum_005

Hyparrhenia_hirta_018

Dichanthium_theinlwinii_136

Schizachyrium_brevifolium_167

Andropogon_virginicus _105

Andropogon_chinensis_125

Cymbopogon_obtectus_010

Themeda_triandra_101

Hyparrhenia_hirta_185

Bothriochloa_bladhii_076

Cymbopogon_citratus_135

Andropogon_gayanus_051

Andropogon_eucomus_001

Andropogon_virginicus_075

Hyparrhenia_hirta._185

Spodiopogon_sibiricus

Bothriochloa_alta_077

Schizachyrium_sanguineum_var_hirtiflorum_007

Andropogon_hallii_052

Cymbopogon_obtectus_091

Capillipedium_spicigerum_090

Hyparrhenia_diplandra_177

Andropogon_gerardii_002

Bothriochloa_bladhii_093

Eremopogon_foveolatum_181

Themeda_villosa_172

TK132_Capillipedium_assimile_132

Hyparrhenia_hirta_018

Outgroups

Cymbopogon_citratus_135

Hyparrhenia_rufa_153

Cymbopogon_jwarancusa_ssp_olivieri_049

Schizachyrium_sanguineum_168Schizachyrium_sanguineum_168

Cymbopogon_martinii_048

Bothriochloa_insculpta

Bothriochloa_woodrovii_083

Bothriochloa_decipiens_086

Heteropogon_triteceus_151

Bothriochloa_bladhii_087

1 / 81

0.59 / *

0.60 / 50

0.59 / 29

1 / 96

0.95 / 58

1 / 71

0.78 / 29

1 / 98

1 / 93

0.98 / 44

0.96 / 36

0.83 / 16

1 / 90

0.64 / 50

1 / 32

0.98 / 46

0.83 / 45

1 / 88

0.95 / 69

0.78 / 45

0.98 / 78

0.99 / 54

1 / 960.82 / 29

0.93 / 73

1 / 72

0.99 / 71

0.95 / 98

0.98 / 31

0.99 / 84

1 / 69

0.99 / 100

0.98 / 43

1 / 89

1 / 96

1 / 71

1 / 99

1 / 91

0.73 / 18

1 / 88

0.92 / 42

0.99 / 67

1 / 91

0.52 / 411 / 71

0.57 / 49

1 / 82

0.67 / 12

0.83 / 29

1 / 100

1 / 95

0.58 / 25

1 / 78

1 / 98

1 / 99

1 / 92

0.79 / 58

0.52 / 51

0.99 / 51

1 / 90

0.92 / 43

0.83 / 72

0.52 / 49

0.59 / 44

0.57 / 47

1 / 96

0.92 / 63

0.59 / 8

0.95 / 43

0.59 / *

0.87 / 68

0.99 / 71

1 / 99

0.96 / 34

0.98 / 72

0.99 / 54

13B

C

15

9

10

11

12

A

16

17

18

19

20

21

22

2324

25 26

27

D

E

28

F

ArundinellaArthraxon

Chrysopogon/ThelepogonPhacelurus/Tripsacum/Zea

Apluda/Coix

Sorghum

Saccharum/Ischaemum

GermainiinaeErianthus

542

785

872

1523

1034

1253

1819

1770

779

1105

993

821

1548

2005

765

4125

2273

2053

3880

921

3082

31301427

1236

1510

470

1881

1151

2175

2121

1762

921

1973

2106

1036

14

1 / 1001 / 100

1 / 1001 / 100

1 / 99

1 / 100

1 / 100

1 / 100

1 / 1001 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 1001 / 100

1 / 100

1 / 100

1 / 90

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 1001 / 98

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 1001 / 100

1 / 1001 / 100

1 / 100

1 / 100

1 / 100

1 / 100

1 / 100

“BCD clade”

“coreAndropogoneae”

A’

Fig. S3. Bayesian phylogeny of core Andropogoneae. Branch numbers, colors, and genome sizes are as in Fig. S2.

Estep et al. www.pnas.org/cgi/content/short/1404177111 5 of 7

Page 6: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

0.00 0.310

35

69

Evolutionary Rate

Den

sity

Chrysopogon serrulatusChrysopogon zizanioidesChrysopogon gryllusThelepogon elegans

Arthraxon prionodesArthraxon lanceolatus

Zea mays genome ATripsacum dactyloidesChionachne massieiPhacelurus huillensisPhacelurus digitatusErianthus ravennaeErianthus arundinaceumSaccharum narengaMiscanthus eckloniiPseudosorghum fasciculareMiscanthus sinensisIschaemum muticumIschaemum indicumIschaemum rugosumIschaemum santapauiDimeria sinensisDimeria ornithopodaDimeria fuscescensAndropterum stolziiMicrostegium vimineumKerriochloa siamensisEulalia aureaPolytrias indicaSorghastrum elliottiSorghastrum nutansIschaemum afrumSehima nervosumSpodiopogon sibiricusSchizachyrium sanguineumSchizachyrium sanguineum var. hirtiflorumSchizachyrium exileSchizachyrium brevifoliumAndropogon halliiAndropogon gerardiiAndropogon eucomusAndropogon virginicusAndropogon glomeratusAndropogon perligulatusAndropogon chinensisDiheteropogon hagerupiiDiheteropogon amplectensHyparrhenia diplandraHyparrhenia rufaHyparrhenia hirtaEulalia quadrinervisEulalia villosaCymbopogon distansCymbopogon jwarancusa ssp. olivieriCymbopogon parkeriCymbopopogon martiniiiCymbopogon citratusCymbopogon flexuosusCymbopogon refractusCymbopogon obtectusHeteropogon triticeusThemeda villosaThemeda arundinaceaThemeda triandraDichanthium theinlwiniiDichanthium aristatumDichanthium annulatumCapillipedium parviflorumCapillipedium spicigerumCapillipedium assimileBothriochloa decipiensBothriochloa macraBothriochloa laguroidesBothriochloa insculptaBothriochloa ischaemumBothriochloa bladhiiBothriochloa altaBothriochloa exaristataBothriochloa hybridaBothriochloa woodroviiBothriochloa barbinodisEremopogon foveolatumIseilema macratherumSorghum bicolor genomeGermainia capitataApocopis intermediusApocopis siamensisApocopis cortallumensisPogonatherum paniceumPogonatherum crinitumImperata cylindricaEriochrysis pallidaCoix lacryma-jobiCoix gasteeniiChasmopodium caudatumRottboellia cochinchinensisApluda muticaArundinella nepalensisArundinella hirta

Fig. S4. Phylorate tree from BAMM (16), with species sample frequency adjusted for each clade, and backbone sampling of 0.5. Models of rate shifts with thehighest posterior probability indicate either no shifts or one. The marginal probabilities of a shift are similar across much of the tree; in other words, theposition of the shift cannot be assigned with certainty.

Estep et al. www.pnas.org/cgi/content/short/1404177111 6 of 7

Page 7: Supporting Information - PNAS · Penn State Huck Institute of the Life Sciences or Beckman Coulter Genomics. Chromatogram files were trimmed of vector and low quality sequences manually

0.0 0.2 0.4 0.6 0.8

0

2

4

6

8

10

12

Net Diversification Rate

Prob

abili

ty d

ensi

ty

DiploidPolyploid

Fig. S5. Net diversification rates (speciation minus extinction) for diploids (blue) and polyploids (yellow), calculated using BiSSE, as implemented in diversitree(19). ML estimates of the rates are significantly different (P < 0.02) compared with a model in which speciation rates are constrained to be equal. Analysisassumes a species sample of 8%.

Other Supporting Information Files

Table S1 (DOCX)

Estep et al. www.pnas.org/cgi/content/short/1404177111 7 of 7