Iroki: automatic customization for phylogenetic trees25 groups. However, phylogenetic analyses of...
Transcript of Iroki: automatic customization for phylogenetic trees25 groups. However, phylogenetic analyses of...
1
Iroki: automatic customization for phylogenetic trees 1
2
Ryan M. Moore1*, Amelia O. Harrison2, Sean M. McAllister2, Rachel L. Marine3, Clara 3
Chan2, and K. Eric Wommack1 4
5
1Center for Bioinformatics & Computational Biology, University of Delaware, Newark, DE, 6
USA 7
2School of Marine Science and Policy, University of Delaware, Newark, DE, USA 8
3Department of Biological Sciences, University of Delaware, Newark, DE, USA 9
10
Corresponding author’s information 11
*To whom correspondence should be addressed 12
Address: Delaware Biotechnology Institute, 15 Innovation Way, Newark, Delaware 13
19711 14
(Tel): (302) 831-4362 15
(Fax): (302) 831-3447 16
(E-mail): [email protected] 17
18
Abstract 19
Background 20
Phylogenetic trees are an important analytical tool for examining species and community 21
diversity, and the evolutionary history of species. In the case of microorganisms, 22
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
2
decreasing sequencing costs have enabled researchers to generate ever-larger sequence 23
datasets, which in turn have begun to fill gaps in the evolutionary history of microbial 24
groups. However, phylogenetic analyses of large sequence datasets present challenges to 25
extracting meaningful trends from complex trees. Scientific inferences made by visual 26
inspection of phylogenetic trees can be simplified and enhanced by customizing various 27
parts of the tree, including label color, branch color, and other features. Yet, manual 28
customization is time-consuming and error prone, and programs designed to assist in 29
batch tree customization often require programming experience. To address these 30
limitations, we developed Iroki, a program for fast, automatic customization of 31
phylogenetic trees. Iroki allows the user to incorporate information on a broad range of 32
metadata for each experimental unit represented in the tree. 33
34
Results 35
Iroki was applied to four existing microbial sequence datasets to demonstrate its utility in 36
data exploration and presentation. Iroki was used to highlight connections between viral 37
phylogeny and host taxonomy, to explore the abundance of microbial groups across 38
samples of cattle hide, to examine short-term temporal dynamics of virioplankton 39
communities, and to search for trends in the biogeography of Zetaproteobacteria. 40
41
Conclusions 42
Iroki is an easy-to-use web app and command line application for fast, automatic 43
customization of phylogenetic trees based on user-provided categorical or continuous 44
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
3
metadata. Iroki allows for rapid hypothesis testing through visualizing custom 45
phylogenetic trees, streamlining the process of phylogenetic data exploration and 46
presentation. 47
48
Availability 49
Iroki can be accessed through a web app or via installation through RubyGems, from 50
source, or through the Iroki Docker image. All source code and documentation is 51
available under the GPLv3 license at https://github.com/mooreryan/iroki. The Iroki web-52
app is accessible at www.iroki.net or through the Virome portal 53
(http://virome.dbi.udel.edu), and its source code is released under GPLv3 license at 54
https://github.com/mooreryan/iroki_web. The Docker image can be found here: 55
https://hub.docker.com/r/mooreryan/iroki. 56
57
Keywords 58
Phylogeny, visualization, sequence analysis, bioinformatics, metagenomics 59
60
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
4
Iroki: automatic customization for phylogenetic trees 61
62
Background 63
Studies in microbial ecology often use phylogenetic trees as a means for assessing the 64
diversity and evolutionary history of microorganisms. As the cost of sequencing has 65
declined, researchers have been able to gather ever-larger sequence datasets. While large 66
sequence datasets have begun to fill in the gaps in the evolutionary history of microbial 67
groups [1–5]; they have also posed new analytical challenges as extracting meaningful 68
trends within such highly dimensional datasets can be cumbersome. In particular, 69
scientific inferences made by visual inspection of phylogenetic trees can be simplified and 70
enhanced by customizing various parts of the tree including label and branch color, 71
branch width, and other features. Though many tree visualization packages allow for 72
manual modifications [6–9], the process can be time consuming and error prone 73
especially when the tree contains many nodes. While a handful of existing programs 74
address the issue of tree visualization, most are not capable of batch customization and 75
those that do often require programming experience [10–13]. 76
77
Iroki, a program for fast, automatic customization of phylogenetic trees, was developed to 78
address these limitations and enable users to incorporate information on a broad range of 79
metadata for each experimental unit represented in the tree. Iroki is available for use 80
through a web interface at www.iroki.net, through the Virome portal 81
(http://virome.dbi.udel.edu), and through a UNIX command line tool. Results are saved in 82
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
5
the widely used Nexus format with color metadata tailored for use with FigTree [8] (a 83
freely available and efficient tree viewer). 84
85
Implementation 86
Iroki enhances visualization of phylogenetic trees by coloring node labels and branches 87
according to categorical metadata criteria or numerical data such as abundance 88
information. Iroki can also rename nodes in a batch process according to user 89
specifications so that node names are more descriptive. A tree file in Newick format 90
containing a phylogenetic tree is always required. Additional required input files depend 91
on the operation(s) desired. Coloring functions require a color map or a biom [14] file. 92
Node renaming functions require a name map. The color map, name map, and biom files 93
are created by the user and, along with the Newick file, form the inputs for Iroki. 94
95
Explicit tree coloring 96
Iroki’s principle functionality involves coloring node labels and/or branches based on 97
information provided by the user in the color map. The color map text file contains either 98
two or three tab-delimited columns depending on how branches and labels are to be 99
colored. Two columns, pattern and color, are used when labels and branches are to have 100
the same color. Three columns, pattern, label color, and branch color, are used when 101
branches and labels are to have different colors. Patterns are searched against node labels 102
either as regular expressions or exact string matches. 103
104
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
6
Entries in the color column can be any of the 657 named colors in the R programming 105
language [15] (e.g., skyblue, tomato, goldenrod2, lightgray, black) or any valid 106
hexadecimal color code (e.g., #FF78F6). In addition, Iroki provides a 19 color palette 107
with complementary colors based on Kelly’s color scheme for maximum contrast [16]. 108
Nodes in the tree that are not in the color map will remain black. 109
110
Depending on user-specified options, a pattern match to node label(s) will trigger coloring 111
of the label and/or the branch directly connected to that label. Inner branches will be 112
colored to match their descendent branches if all descendants are the same color, 113
allowing quick identification of common ancestors and clades that share common 114
metadata. 115
116
Tree coloring based on numerical data 117
Iroki provides the ability to generate color gradients based on numerical data, such as 118
absolute or relative abundance, from a tab-delimited biom format file. Single-color 119
gradients use color saturation to illustrate numerical differences, with nodes at a higher 120
level being more saturated than those at a lower level. For example, highly abundant 121
nodes will be represented by more highly saturated colors. Two-color gradients show 122
numerical differences through both color mixing and luminosity. Additionally, the biom 123
file may specify numerical information for one group (e.g., abundance in a particular 124
sample) or for two groups (e.g., abundance in the treatment group vs. abundance in the 125
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
7
control group). For biom files with one group, single- or two-color gradients may be used. 126
However, biom files specifying two-group metadata may only use the two-color gradient. 127
128
Renaming nodes 129
Some packages for generating phylogenetic trees restrict the use of special characters and 130
spaces or require node names to be shorter than a specified length or (RAxML [17], 131
PHYLIP [18], etc.). Name restrictions present challenges to scientific interpretation of 132
phylogenetic trees. Iroki’s renaming function uses a two-column, tab-delimited name map 133
to associate current node names, exactly matching those in the tree file, with new names. 134
The new name column has no restrictions on name length or character type. Iroki ensures 135
name uniqueness by appending integers to the ends of names, if necessary. 136
137
Combining the color map, name map, and biom files 138
Iroki can be used to make complex combinations of customizations by combining the 139
color map, name map, and biom files. For example, a biom file can be used to apply a 140
color gradient based on numerical data to the labels of a tree, a color map can be used to 141
separately color the branches based on user-specified conditions, and a name map can be 142
used to rename nodes in a single command or web request. Iroki follows a specific order 143
of precedence when applying multiple customizations. First, the color gradient inferred 144
from the biom file is applied. Next, the color map is applied to specified labels or 145
branches, overriding the gradient applied in the previous step if necessary. Finally, the 146
name map is used to map current names to the new names (Fig. 1). 147
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
8
148
Output 149
Iroki outputs the modified tree in the Nexus format. When building the phylogenetic tree, 150
FigTree uses the Nexus format file and interprets the color metadata output of Iroki. 151
152
Results & Discussion 153
154
Global diversity of bacteriophage 155
Viruses are the most abundant biological entity on Earth, providing an enormous reservoir 156
of genetic diversity, driving evolution of their hosts, influencing composition of microbial 157
communities, and affecting global biogeochemical cycles [19,20]. The current viral 158
taxonomic system is based on a suite of physical characteristics of the virion rather than 159
on genome sequences. The phage proteomic tree, created to provide a genome-based 160
taxonomic system for bacteriophage classification [21], was recently updated to include 161
hundreds of new phage genomes from the Phage SEED reference database [22], as well as 162
long assembled contigs from viral shotgun metagenomes (viromes) collected from the 163
Chesapeake Bay (SERC) [23] and the Mediterranean Sea [24]. 164
165
Taxonomy and host information metadata was collected for the viral genome sequences, a 166
color map was created to assign colors based on viral family and host phyla, and Iroki was 167
used to add color metadata to branches and labels of the phage proteomic tree. Since a 168
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
9
large number of colors were required on the tree, Iroki’s Kelly color palette was used to 169
provide clear color contrasts. The tree was rendered with FigTree (Fig. 2). 170
171
Adding color to the phage proteomic tree with Iroki shows trends in the data that would 172
be difficult to discern without color. Uncultured phage contigs from the SERC and 173
Mediterranean viromes make up a large portion of all phage sequences shown on the tree, 174
and are widely distributed among known phage. In general, viruses in the same family 175
claded together, e.g., branch coloring highlights large groups of closely related 176
Siphoviridae and Myoviridae. This label-coloring scheme also shows that viruses infecting 177
hosts within same phylum are, in general, phylogenetically similar. For example, viruses 178
within one of the multiple large groups of Siphoviridae across the tree infect almost 179
exclusively host species within the same phylum, e.g., Siphoviridae infecting 180
Actinobacteria clade away from Siphoviridae infecting Firmicutes or Proteobacteria. 181
182
Bacterial community diversity and prevalence of E. coli in beef cattle 183
Shiga toxin-producing Escherichia coli (STEC) are dangerous human pathogens that 184
colonize the lower gastrointestinal tracts of cattle and other ruminants. STEC-185
contaminated beef and STEC shed in the feces of these animals are major sources of 186
foodborne illness. To identify possible interactions between STEC populations and the 187
commensal cattle microbiome, a recent study examined the diversity of the bacterial 188
community associated with beef cattle hide [25]. Fecal and hide samples were collected 189
over twelve weeks and SSU rRNA amplicon libraries were constructed and analyzed by 190
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
10
Illumina sequencing [26]. The study indicated that the community structure of hide 191
bacterial communities was altered when the hides were positive for STEC contamination. 192
193
Iroki was used to visualize changes in the relative abundance of each cattle hide bacterial 194
OTU according to the presence or absence of STEC. A Mann-Whitney U test comparing 195
OTU abundance between STEC positive and STEC negative samples was performed, and 196
those bacterial OTUs showing a significant change in relative abundance (p < 0.5) were 197
placed on a phylogenetic tree according to the 16S rRNA sequence. Branches of the tree 198
were colored based on whether there was a significant change in relative abundance with 199
STEC contamination (red: p <= 0.05, blue: p > 0.05). Node labels were colored along a 200
blue-green color gradient representing the abundance ratio of OTUs between samples 201
with STEC (blue) and without (green). Additionally, label luminosity was determined 202
based on overall abundance (lighter: less abundant, darker: more abundant) (Fig. 3). Iroki 203
makes it clear that most OTUs on the tree showed a significant difference in abundance 204
(branch coloring) between STEC positive and STEC negative samples (node coloring). 205
Furthermore, we can see that most OTUs are at low abundance with only a few highly 206
abundant OTUs (label luminosity). The color gradient added by Iroki allows us to see that 207
the abundant OTUs were evolutionarily distant from one another and thus spread out 208
across many phylogenetic groups. 209
210
Iroki can be used to quickly test hypotheses without investing a large amount of time 211
annotating trees manually. A UPGMA tree was created based on unweighted UniFrac 212
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
11
distance [27] between 356 bacterial community profiles based on SSU rRNA amplicon 213
sequences from cattle hide and feces samples (Fig. 4). Iroki was used to evaluate 214
similarities in sample bacterial communities according to the sampling location. Iroki 215
colored branches based on whether the sample originated from feces (blue) or from hide 216
(red). The coloring added by Iroki shows a clear partitioning of bacterial communities on 217
the tree based on their sampling location (hide or feces). However, four fecal samples 218
claded with hide samples, and two hide samples claded with fecal samples, highlighting 219
the ability of Iroki to easily identify good candidates for more in-depth examination. 220
Additionally, Iroki was used to illustrate a correlation between one of the most abundant 221
bacterial families, Ruminococcaceae, and sampling location. Iroki colored node labels 222
with a color gradient based on Ruminococcaceae family abundance, utilizing both a 223
single color gradient (Fig. 4A) and a two color gradient (Fig. 4B). Custom trees were 224
visualized using FigTree. Iroki’s automatic color gradient and ability to label branches 225
and nodes based on different criteria clearly show that Ruminococcaceae is more 226
abundant in fecal samples than in hide samples. 227
228
Short-term dynamics of virioplankton 229
The gene encoding Ribonuclotide reductase (RNR) is common within viral genomes and 230
thus can be used as a marker gene for studying viral diversity. Moreover, RNR 231
polymorphism is predictive of some of the biological and ecological features of viral 232
populations [28]. A mesocosm experiment examined the short-term dynamics of phage 233
populations using RNR amplicon sequences, specifically, sequences of class II RNRs of 234
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
12
bacteriophages infecting cyanobacterial hosts. A phylogenetic tree was created from the 235
Cyano II RNR amplicon sequences and Iroki was used to color nodes and branches based 236
on the time point (0 h, 6 h, 12 h) at which each amplicon sequence was observed. The 237
customized tree was then visualized using FigTree (Fig. 5). Iroki’s coloring showed that no 238
phylogenetic clade was dominated by OTUs observed in any particular time point; rather, 239
time points were spread relatively evenly across clades. This analysis demonstrates Iroki’s 240
utility for exploring sequence datasets, allowing the researcher to quickly and easily test 241
hypotheses. 242
243
Phylogeny of Zetaproteobacteria within a biogeographic context 244
Biogeographical studies assess the distribution of an organism’s biodiversity across space 245
and time. The extent to which microorganisms exhibit biogeography is an open question 246
in microbial ecology. The isolated nature of the microbial communities associated with 247
deep-ocean hydrothermal vents provides an ideal system for studying the biogeography of 248
microbes. In particular, iron-oxidizing bacteria have been shown to thrive in vent fluids, 249
sediments, and iron-rich microbial mats associated with the vents. Globally, iron-250
oxidizing bacteria make significant contributions to the iron and carbon cycles. A recent 251
study analyzed multiple SSU rRNA clone libraries to investigate the biogeography of 252
Zetaproteobacteria, a phyla containing many iron-oxidizing bacterial species, between 253
three sampling regions of the Pacific Ocean (central Pacific—Loihi seamount, western 254
Pacific—Southern Mariana Trough, and southern Pacific (Vailulu’u Seamount/Tonga 255
Arc/East Lau Spreading Center/Kermadec Arc) [29]. Sequences were aligned and a 256
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
13
phylogeny was inferred as described in [29]. Iroki was used to examine the relationship 257
between sampling location and phylotype by adding branch and label color based on 258
geographic location and renaming original node labels with OTU and location metadata. 259
The custom tree was visualized using FigTree (Fig. 6). In some cases, OTUs contained 260
sequences from only one sampling location (e.g., OTUs 12, 15, and 16), whereas other 261
OTUs are distributed among more than one sampling location (e.g., OTUs 1, 2, and 4). 262
Often, sequences sampled from the same geographic location are in the same phylotype 263
despite being members of different OTUs (e.g., OTUs 10 and 19). 264
265
Availability and requirements 266
A web-based version of Iroki can be accessed online at www.iroki.net or through the 267
Virome portal (http://virome.dbi.udel.edu/). For users who wish to run Iroki locally, a 268
command line version of the program is installable via RubyGems, from GitHub 269
(https://github.com/mooreryan/iroki). A Docker image is available for users who desire the 270
flexibility of the command line tool, but do not want to install Iroki or manage its 271
dependencies (https://hub.docker.com/r/mooreryan/iroki). Docker is a convenient method 272
for packaging an application with all of its dependencies that is guaranteed to run the 273
same regardless of the user’s environment [30,31]. The README provided with the source 274
code provides detailed instructions for setting up and running Iroki. Further 275
documentation and tutorials can be found at the Iroki Wiki 276
(https://github.com/mooreryan/iroki/wiki). 277
278
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
14
License 279
Iroki and its associated programs are released under the GNU General Public License 280
version 3 [32]. 281
282
Conclusions 283
Iroki is a command line program and web app for fast, automatic customization of large 284
phylogenetic trees based on user specified configuration files describing categorical or 285
continuous metadata information. The output files include Nexus tree files with color 286
metadata tailored specifically for use with FigTree. Various example datasets from 287
microbial ecology studies were analyzed to demonstrate Iroki’s utility. In each case, Iroki 288
simplified the processes of data exploration, data presentation, and hypothesis testing. 289
Iroki provides a simple and convenient way to rapidly customize phylogenetic trees, 290
especially in cases where the tree in question is too large to annotate manually or in 291
studies with many trees to annotate. 292
293
List of Abbreviations 294
OTU: operational taxonomic 295
RNR: Ribonucleotide reductase 296
STEC: Shiga-toxengenic Escherichia coli 297
298
Ethics approval and consent to participate 299
Not applicable 300
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
15
301
Consent for publication 302
Not applicable 303
304
Availability of data and materials 305
Data and code used to generate figures are available on GitHub at 306
https://github.com/mooreryan/iroki_manuscript_data 307
308
Funding 309
This project was supported by the USDA National Institute of Food and Agriculture award 310
number 2012-68003-30155 and the National Science Foundation Advances in 311
Bioinformatics program (award number DBI_1356374). 312
313
Competing Interests 314
The authors declare that they have no competing interests. 315
316
Authors’ contributions 317
RMM and SMM conceived the project. RMM wrote the manuscript and implemented 318
Iroki. AOH and RLM processed and analyzed Cyano II amplicons. All authors read, 319
edited, and approved the final manuscript. 320
321
Acknowledgements 322
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
16
We would like to acknowledge Daniel J. Nasko and Jessica M. Chopyk for their work on 323
the phage proteomic tree, and Barbra D. Ferrell for editing the manuscript. 324
325
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
17
References 326
1. Lan Y, Rosen G, Hershberg R. Marker genes that are less conserved in their sequences 327
are useful for predicting genome-wide similarity levels between closely related prokaryotic 328
strains. Microbiome. 2016;4:18. 329
2. Larkin A a, Blinebry SK, Howes C, Lin Y, Loftus SE, Schmaus CA, et al. Niche 330
partitioning and biogeography of high light adapted Prochlorococcus across taxonomic 331
ranks in the North Pacific. ISME J. 2016;1–13. 332
3. Simister RL, Deines P, Botté ES, Webster NS, Taylor MW. Sponge-specific clusters 333
revisited: A comprehensive phylogeny of sponge-associated microorganisms. Environ. 334
Microbiol. 2012;14:517–24. 335
4. Wu Z, Yang L, Ren X, He G, Zhang J, Yang J, et al. Deciphering the bat virome catalog 336
to better understand the ecological diversity of bat viruses and the bat origin of emerging 337
infectious diseases. ISME J. 2016;10:609–20. 338
5. Müller AL, Kjeldsen KU, Rattei T, Pester M, Loy A. Phylogenetic and environmental 339
diversity of DsrAB-type dissimilatory (bi)sulfite reductases. ISME J. 2015;9:1152–65. 340
6. University W. Phylogeny Programs. 341
http://evolution.genetics.washington.edu/phylip/software.html#Plotting. Accessed 2016 Jul 342
21. 343
7. Zhang H, Gao S, Lercher MJ, Hu S, Chen WH. EvolView, an online tool for visualizing, 344
annotating and managing phylogenetic trees. Nucleic Acids Res. 2012;40. 345
8. Rambaut A. FigTree. http://tree.bio.ed.ac.uk/software/figtree/. Accessed 2016 Jul 21. 346
9. Zmasek CM. Archaeopteryx. 347
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
18
https://sites.google.com/site/cmzmasek/home/software/archaeopteryx. Accessed 2016 Jul 348
21. 349
10. Paradis E, Claude J, Strimmer K. APE: Analyses of phylogenetics and evolution in R 350
language. Bioinformatics. 2004;20:289–90. 351
11. Revell LJ. phytools: An R package for phylogenetic comparative biology (and other 352
things). Methods Ecol. Evol. 2012;3:217–23. 353
12. Huerta-Cepas J, Dopazo J, Gabaldón T. ETE: a python Environment for Tree 354
Exploration. BMC Bioinformatics. 2010;11:24. 355
13. Chen W-H, Lercher MJ, Ganfornina M, Gutierrez G, Bastiani M, Sanchez D, et al. 356
ColorTree: a batch customization tool for phylogenic trees. BMC Res. Notes. BioMed 357
Central; 2009;2:155. 358
14. McDonald D, Clemente JC, Kuczynski J, Rideout J, Stombaugh J, Wendel D, et al. The 359
Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love 360
the ome-ome. Gigascience. 2012;1:7. 361
15. Ripley BD. The R project for statistical computing. 2001. p. 1–3. 362
16. Kelly KL. Twenty-two colors of maximum contrast. Color Eng. 1965. p. 26–7. 363
17. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of 364
large phylogenies. Bioinformatics. 2014;30:1312–3. 365
18. Felsenstein J. PHYLIP. http://evolution.gs.washington.edu/phylip.html. Accessed 2016 366
Jul 21. 367
19. Suttle CA. Marine viruses — major players in the global ecosystem. Nat. Rev. 368
Microbiol. Nature Publishing Group; 2007;5:801–12. 369
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
19
20. Rohwer F, Thurber RV. Viruses manipulate the marine environment. Nature. Nature 370
Publishing Group; 2009;459:207–12. 371
21. Rohwer F, Edwards R. The Phage Proteomic Tree: a genome-based taxonomy for 372
phage. J. Bacteriol. 2002;184:4529–35. 373
22. Phage SEED. http://www.phantome.org/PhageSeed/Phage.cgi. Accessed 2016 Jul 21. 374
23. Wommack KE, Nasko DJ, Chopyk J, Sakowski EG. Counts and sequences, 375
observations that continue to change our understanding of viruses in nature. J. Microbiol. 376
2015;53:181–92. 377
24. Mizuno CM, Rodriguez-Valera F, Kimes NE, Ghai R. Expanding the marine virosphere 378
using metagenomics. PLoS Genet. Public Library of Science; 2013;9:e1003987. 379
25. Chopyk J, Moore RM, DiSpirito Z, Stromberg ZR, Lewis GL, Renter DG, et al. Presence 380
of pathogenic Escherichia coli is correlated with bacterial community diversity and 381
composition on pre-harvest cattle hides. Microbiome. BioMed Central; 2016;4:9. 382
26. Fadrosh DW, Ma B, Gajer P, Sengamalay N, Ott S, Brotman RM, et al. An improved 383
dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina 384
MiSeq platform. Microbiome. 2014;2:6. 385
27. Lozupone C, Knight R. UniFrac: a New Phylogenetic Method for Comparing Microbial 386
Communities. Appl. Environ. Microbiol. American Society for Microbiology; 387
2005;71:8228–35. 388
28. Sakowski EG, Munsell E V., Hyatt M, Kress W, Williamson SJ, Nasko DJ, et al. 389
Ribonucleotide reductases reveal novel viral diversity and predict biological and 390
ecological features of unknown marine viruses. Proc. Natl. Acad. Sci. National Academy 391
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
20
of Sciences; 2014;111:15786–91. 392
29. McAllister SM, Davis RE, McBeth JM, Tebo BM, Emerson D, Moyer CL. Biodiversity 393
and emerging biogeography of the neutrophilic iron-oxidizing Zetaproteobacteria. Appl. 394
Environ. Microbiol. American Society for Microbiology (ASM); 2011;77:5445–57. 395
30. Biodocker. http://biodocker.org/. Accessed 2016 Jul 21. 396
31. Merkel D. Docker: lightweight Linux containers for consistent development and 397
deployment. Linux J. Belltown Media; 2014;2014:2. 398
32. GNU Operating System. http://www.gnu.org/licenses/. Accessed 2016 Jul 21. 399
400
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
21
401
Fig. 1: Precedence of Iroki’s customization pipeline 402
Flowchart illustrating the precedence of steps when performing multiple customizations 403
with Iroki. Input/output files are purple, command line options are in blue, and processes 404
are orange. The choice of exact or regular expression matching guides each subsequent 405
step of the process. Iroki gives higher precedence to processes towards the bottom of the 406
diagram. For example, given that a user selects the options for coloring both labels and 407
branches, and provides both a biom file and color map with the color map specifying 408
colors for the labels only, then the branches will be colored according to the color 409
gradient inferred from the biom file, whereas the labels will be colored according to the 410
rules specified in the color map. 411
412
Apply color gradient
Apply specified colors
Rename nodes
Matching option
Exact or regexp mathching?
Color branches, labels, or both?
Coloring options
Biom file
Color map
Name map
Newick file
Nexus file
Lower
Higher
Precedence
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
22
413
Fig. 2: Comparing phage and their host phyla 414
All phage genomes from Phage SEED with assembled virome contigs from the Chesapeake 415
Bay and Mediterranean Sea. Iroki highlights phylogenetic trends after coloring branches 416
according to viral family or sampling location in the case of virome contigs (marked with 417
an asterisk in the legend), and coloring node labels according to host phylum of the 418
phage. 419
420
0.9
100316
101216
100964
101281
100068
100040
400137
100499
200168
100018
100112
101222400
003100039
100245
100113
200220
100718
100784
200450
101267
100823
500001
500005
100653
400127
101126
100526
100214
100988
400058
300007
300052
400240
200135
100707
400273
200324
100993
101010
200067300022
400005
300028
100651
200297
101102
100681
100924
500031
100818
100217
101094
101349
100585
200184
300001
100162
100803
100032
400275
700020
101283
100206
200318
400067
400085
101136
200327
100813
100143
200230
400172
200044
700021
101154
200314
400205
100610
400265
300040300057
101419
500020 100224
100736
700005
400043
101007
100247
200397
101050
100518
100830
400092
400248
101158
101165
100264
200144
100559
100436
400216
400066
400080
100531
100315
300150
300135
300071
101385
100759
100271
500010
300245
100373
100104
100331
200086
101304
600028
101140101169
400121
100383
400021
101391
400214
400145
300069
101250200021
101071
400187
100727
101048
400138
300039
200113
100352
300168
200176
400250
100772
200219
101076
100256
400044
100321
100948
100478200380
300054
200081
100769
100715
200408
200346
200467
100389
100466
100954
300204
400180
100596
100773
100457
300254
100637
100735
100553
100913
100978
200191
100872
101005
200488200401
100402
300086
100933
400197
101127
100956
300148
100656
400068
700009
200479
100455
100771
200201
400134
300132
100354
200460
101324
700024
200123
200420
400006
200294
200469
100517
300077
100152
200258
100169
200256
300125101242
300236
300189
300081
500021
400125
100255
101286
101032
100636
800001
100055
500027
100520
100843
200371
101078
100044
100187
100575
200082
200406
100742
300108
100151
300053
200217
101043
400218
101416
200038
100763
100556
700030
300005
100529
400146
101372
100244
200465
200440
100097
101233
200384
100484
101369
200179
100510
100995
100533
400002
300084
200402
200095
600031101157
100135
400060
400253
200061
100951
100722
200105
100744
200478
100903
200163
101353
400089
300171
100469
100471
100314
100346
100022
400261
101054
100060
400069
100921
200087
400141
400130
101059
101092
101164101325
200094
200112
101287
700004
200307
100147
101025
100267
101069
100363
100840
100611
100716
900001
100851
100317
101135
100939
300006
100269
100600
101389
101378
100180
300256
101370
100902
200118
100661
100324
100025
300239
100495
300195
200088
200321
200236
100542
200181
400270
200378
200474
300093
100133
100195
200183
100480
101236
300015
100108
101026
100201
400063
100665
100213
100397
100158
100325
100998
200312
400200
101067
400033
200104
100848
300244
900006
100807
100799
100083
100502100372
400168
100880
100326
200404
100035
300190
100265
100957
101033
300161
300193
200076
100938
101122
200102
200139
100042
400017
200025
400116
200341
100128
400159
100362400151
100950
400246
200468
101245
100211
200208
200381
100677
100947
100836
400232
200464
100968
300223
100485
101075
200315
100222
100787
300214
100753
400094
400189
200308
100382
400229
200363
101220
500000
100294
300115
100401
100081
100420
300059
100058
300152
200379
100805
300233
100524
100007
100724
200311
100940
100366
101013
100238
200174
700023
400182
10101420028
6
500032
100834
100754
400113
100857
100030
300151
100492
300140
200244
100052
400014
100121
400082
200435
200388
200203
200097
100434
100239
100182
101273
100605
100475
101148
200390
400055101315
200014
300094
200154
100783
200211
100310
101021
100139
200075
101336
500018
40019540
0165
101130
100023
200252
200356
200472
100076
101331
100193
101383
101367
100260
100639
200306
100909
101088
100817
400199
101327
100776
101252
900003
100231
100896
900002
400061
200326
100295
200026
100298
101412
400018
500009
100965
100991
100197
400149
200165
300020
400163
100885
100378
101194
400176
100446
100003
101400
100566
300050
200234
400254
300107
101110
100278
300240
700022
200272
400170
100539
101395
100350
200092
100941
400117
100967
300055
100884
400227
400133
100863
300202
300011
100800
100164
100765
100490
101237
100232
101302
101037
100489
100448
100299
200043
300156
200473
100728
400269
101357
300119
200042
200169101020
101156
200351
300109
200389
100209
300225
100702
200032
300220
200285
100387
200111
200471
300212
101329
200296
300034
300265
400091
300009
101334
200182
101205
300066
300036
100075
200196
400093
200273
101335
100487
100692
300219
100192
100970
101218
101175
200137
200216
100370
300222
100689
100240
200410
200396
300147
100019
200254
100109
300186
100215
101342
100419
100874
200329
101352
300197
300072
400074
100670
100351
100095
100503
400097
300172
100340
200090
300127
100496
300178
200364
200251
100806
100493
300174
200343
200024
100693
101123
100793
100276
100777
300090
200187
200007
200431
100146
700011
100241
200117
200116
100011
100551
101341
200372
101420
101182
100017
101137
100658
100398
300155
300145
100177
100568
100710
300248
400103400147
100535
100285
100300
100117
300110
101269
20015510
0353
300275
200263
400039
100839
300106
101191
200491
300130
500037
100451
100166
100612
100302
300166
400191
101306
100528
100418
400064
100705
101376
200068
100764
300209
300105
100766
101112
200476
100774
300128
200157
100515
100855
200270
700029101359
200487
100545
300111
200293
200405
400245
200096
400183
200276
100504
100930
300043
300079
200245
101155
100668
100320
100569
300149
200434
400160
400143
100642
400100
300068
700015
400112
101409
500033
200289
200008
200119
200052
101060
200295
300091
200051
200275
100024
100071
100711
100437
100743
600035
400050
400267
600027 101018
600017
100275
200060
101406
400008
200120
100911
101373
101282
300114
100572
200262
101196
300198
300035
500030
200291
100221
100483
100323
100394
600005
800004
100931
100778
400181
300078
100507
100907
100050
200034
100348
101272
100679
200166
100347
300056
101100
100156
300274
400096
101000
100522
101166
200107
100879
200437
100167
400070
101195
100479
800002
100802
100199
100360
100029
100178
200227
101371
100426
101119
101366
100549
101201
100406
400019
200353
300029
400257
100811
100901
400242
100633
100400
100287
100467
100252
300080
100063
300257
300146
100332
100883
100198100073
100538
200409
101056
200439
100220
200069
101031
400210
400057
101231
100544
100334
100761
101189
400167
100062
101063101159
100337
100864
101217
100061 300016
600021
101095
101015
100005
700013
100080
200320
300235300076
400124
101206101225
200316
200180
100796
101360
100013
100379
700010
100634
900005
300241
101271
101277
101108
100871
400032
101214
101339
101082
200328
101051
600004
100838
300095200198
101047
101402
100416
100015
100395
100567
101027
100388
200159
100435
100098
400243
100235
500014
100746
100827
101097
300134300118
101172
100427
300175
200143
101034
100227
400025
100562
100573
101279
100274
400268
200425
100918
100482
100356
400086
100888
200109
101186
100975
200177
600011
100682
300144
100161
100415
300085
200246
100047
100717
500016
100194
700016
100384
100959
200334
300116
100123
200077
101363
100486
200030
100453
100952
600002
101265
100454
400156
200451
200204
101188
200071
200445
700018
100980
200017
100678
600009
101180
101096
101309
400228
100020
100760
100122
100142
101293
101347
100536
100313
200393
101333
300169
100584
200359
200005
100714
700019
100116
100969
400224
100875
400040
100228
100626
101244
100283
101253
101003
100821
100861
100757
100286
900000
100119
101012
200023
100218
101016
100629
101298
100153
300270
200442
200428
300065
400158
100897
100961
100598
200050
100876
600013
300096
101208
200417
100666
100026
200422
200226
400052
500022
101266
100866
100997
100691
500004
100814
100986
300051
300088
100369
400078
101061
100144
200392
200335
200003
100408101178
100595
400036
100561
300203
101261
200055
200368
700027
101414
600032400169
101270
300012
101085
400051
200462
100456
100792
400225
300137
100706
101086
100726
500036
400104
300217
200047
101215
101386
200375
101343
100165
100074
200073
100236
100280
200367
200058
100846
200412
100497
100664
700007
400076
300213
200313
200028
100546
200210
101101
300177
400150
300277
600019
101192
300143
100404
100752
200310
101368
200129
100103
101410
101238
100917
200398
100597
100132
200280
100640
400212
100588
300074
300267
100824
200164
300139
200062
700012
100361
300153
300061
101114
101258
400001
100684
400122
100303
100105
200365
100004
400077
400101
101247
200122
101209
100059
100748
500029
100768
200186
100663
101401
101228
101129
100996
200330
101036 100581
200124
200248
100403
100101
300049
100816
100908
400106
100367
200132
100445
101116
200221
300176
100246
400056
400217
101009
300060
100674
300158
100652
100511
100422
400162 101212
200250 100657
100084
200197
100100
500019
400164
101246
400185
100576
101179
300092
100012
101350
100513
300201
200323
100094
200206
100631
200430
200195
100141
101073
200147
500011
400209
101314
100154
100591
200078
400102
100183
200041
100630
101388
100798
101109
100430
100794
100910
300183
101145
100889
200022
101074
101107
200190
400194
700001
100983
100371
100942
100850
400219
100844
100391
300278
100958
200337
400198
100140
200362
100033
101099
101297
400075
200010
101354
100557
100847
400135
100619
100590
100465
101289
100293
300121
100505
300250
101351
101240
100733
100981
100120
101375
100537
400065
400118
100859
100051
300123
500013
101028
300000
200098
101153
300026
100136
300255
101323
200089
101337
300124
200352
200441
100096
400144
100172
200438
101294
400220
100093
100470
100463
400009
101029
101235
100208
400119
101111
101167
200485
101068200063
100002
600034
300224
100873
200194
101260
100077
101053
200001
100555
400233
600022
100242
100574
600008
200199
200148
100849
800007
100145
100660
800010
100905
100683
101081
100476
101151
100254
400179
101256
300017
101089
101396
101070
100462
400201
200407
300098
300180
100066
100521
100443
300263
400049
200057
300167
100832
101023
200000
100577
100288
101203
300138
200080
400239
400247
400223
200218
200247
200045
100176
200145
200085
101345
100780
400196
100616
200188
200138
100987
100999
200482
700017
101185
100582
200240
200480
101356
100438
101141
400140
400000
200040
100124
300038
200002
100392
400031
700003100672100949
100106
100989
101219
101223
100770
101030
100723
100675
101330
100216
300064
100399
100338
200185
200178
400235
200152
101093
400007
100102
100791
100421
200161
100841
101361
101090
200415
200162
200347
100892
400054
400071
300162
101039
100230
200369
101264
101008
100852
101118
101162
300099
100284
100881
100685
100725
200452
400123
300259100186
100333
100570
100501
101183
100695
101311
200454
100297
200281
400048 101381
200355
300112
101408
200457
100021
101024
300268
101364
300008
100779
100477
200156
200395
300232
101254
600003
200141
100150
100251
100618
200424
200049
200304
100937
200160
100034
100974
100632
101313
400042
200235
200436
100006
200268
200271
101197
200079
100009
300044
700002
300129
101150
100856
100946
900007
300159
100014
200205
400081
600016
101365
200432
100506
100641
101303
300258
100749
100409
600018
200046
101174
100719
200349
100669
101019
100826
100010
100543
200287
300194
100741
100620
500028
101065
300117
200292
100985
101255
100085
400034
100615
101202
400154
400190
100090
100225
300221
300273
200385
100920
101077100698
100091
100045
100203
300018
100819
400193
300208
400087
200342
100412
200333
200477
200192
100056
100016
300182
101344
100699
100667
100586
300126101358
100088
100915
100092
101211
400264
900009
200175
101332
300113
100237
300266
100547
200459
101241
101340
100449
200027
200463
100414
300062
400174
101184
200241
100329
100188
200108
100697
100514
300192
400171
300004
300264
101040
300262
100273
100411
400011
200414
100745
100472
100089
100049
100833
200358
100523
200012
300261
300200
101377
101046
400073
200233
100202
100159
400059
100311
300037101227
100429
200158
100895100223
100858
200126
100919
100308
100962
300272
100704
300045
400256400088
100173
100433
300185
400237
100592
400166
100253
100713
200357
100281
200213
101139
200411100734
200099
300046
400207
400026
100899
300010
100450
100865
101142
100923
300136
100676
101091
200238
400142
101275
100703
200400
101002
100185
200483
200224
600036
200448
101380
100625
100831
101113
101171
200278
300083
101405
600020
101115
300133
100155
300196
300131
500025
300247
400251 200267
100243
400010
200301
400206
100966
100065
100587
100720
200309
100336 101224
100732
400188
101390
400252
100767
101105
100086
100053
100257
300191
101251
101131
400132
100070
200403
400204
200453
101138
400126
100878
600033
300032
100290
200223
300229
100322
101374
200212
100565
400029
101382
100797
300103
200374
101001
400213
100571
100424
100932
101064
101147
200418
300104
100914
800009
300216
600026
500034
100423
200065
700025
101083
100973
400027
100552
100041
300014
300100
600023
100319
100200
100701
100335
300070
100635
101316
100341
200429
100558
101041
100607
400062
100131
200303
200015
101106
100038
100601
100125
100509
100001
200370
200447
100609
100886
200481
400016
400015
100812
100270
300164
101120
101362200215
200419
100862
200466
100853
500017
100268
100747
200444
600010
100365
100976
200231
300276
101312
400013
400084
100508
300063
100491
101038
200456
101079
300102
100860
100432
200153
700008
400072
100292
400215
100648
300249
400023
100500
100130
300188
100963
200020
100249
100460
200290
200053
200266
100756
100190
101229
300165
100654
101173
100564
100328
100376
300230
200493
100393
100740
200074
400024
100589
100248
100138
100291
101134
100046
800006
700006
600014
200391
200427
100579
200325
200257
200443
200259
101248
300087
200253
100157
500002
101310
100795
100318
200264
400090
101413
200366
400211
100444
101411
100413100935
100057
100474
200416
200037
300160
400041
100809
400111
300253
100614
200066
200036
800008
200171
101011
200114101379
100583
400037
300120
101103
500026
200394
200016
100441
200018
100804
100289
101022
100643
200339
300252
800003
300279
300021
101407
200149
400105
100790
300031
100272
100990
100621
200332
100358
101104
100087
101057
400177
200336
100464400192
100786
300013
100279
400255
101399
400258
101187100709
100175
200131
200222
300019
300218
300157
100599
200207
300003100191
300048
100994
200167
100731
101393
200299
200151
101278
100540
700028
400083
100343
100126
500003
300179
300173
101146
200277
101170
101394
100900
101397
100259
101044
300122
300024
400099
300097
101417
100064
200300
200128
101418
300234
100431
200288
200243
400047
101049
200492
200084
100982
101062
100079
200070
200490
200426
100027
300089
200283
400045
100516
100680
101288
100417
200150
100751
300042
100548
100627
200130
200274
500015
100067
100115
100114200475
500007
300246
200125
400079
100646
100972
100148
100266
200449
100459
600006
100608
101176
400108
101262
100304
400115
100219
100984
100181
100550
400028
200265
100694
100835
300205
200072
100357
500038
300199
200446
300184
100801
100775
100527
200383
100031
100339
100439
400030
101230
200279
101017
600025
600007
100785
200305
101143
300271
400184
400035
300033
200101
200006
100686
200350
100498
100458
101055
100037
101084
300243
400120
200011
100936
100613
100364
100111
100229
100160
100945
101299
400107
200106
100196
200284
101080
101004
400131
200103
100955
101280
200209
400157
400226
200345200361
500023
300023
100377
100580
300163
400260
800000
100233
400173
200237
100916
200298
100447
100359
500008
600012
200225
100380
101221
300041
100825
100107
101274
101415
100729
100407
200489
500006
100473
101307
100386
100263
100644
200035
200064
100258
100327
200382
200344
100305
101291
300251
300025
100926
200399
101317
200170
101404
200214
100110
200348
200200
900008
100390
400234
101296
200302
100385
101035
100617
100662
400128
200377
100925
101295
200091
100810
400161
400046
101125
100078
101249
100277
100028
100534
200110
100054
100912
200354
100893
100808
400155
400241
300242
200100
100898
100137
101133
300231
1010662004
55
200232
100649
100712
101321
400139
100739
100837
100072
101305
100375
300215
100184
101355
400038
700014
100906
400231
400114
100594
100854
101308
100762
100250
100342
100461
100207
400109 101322
101285
101213
300226
100992
101328
200255
101072
400152
100212
600001
101259
400148
100730
300187
100262
101124
100179
100368
300210
200031
100048
200331
200423
200193
101045
100869
200202
100396
200093
600000
100312
400012
400178
100868100174
300027
100099
101042
400244
300181
200319
100442
100043
101403
100628
100882
101384
400098
100532
100468
100000
101210
200083
100755
100690
300170
101301
101232
200056
101199
200486
200373
101052
100204
300260
101392
400208
100189
100134
100638
101290
400186
300227
400175
100345
100979
100877
400020
100374
100623
101006
800005
200360
600015
400222
100602
300002
400110
101087
100650
600029
100960101348
200239
100530
200134
300073
200146
100738
100168
200386
101177
300269
101338
100170
100820
100560
100307
101387
200039
100309
101243
101226
600030
200127
100788
100261
100163
100750
200376
101234
100822
101193
101207
100622
100782
300211
500035
400153
100708
200140
100593
200260
400262
100296
101292
100737
100845
400221
400230
100645
300047300082
200340
400236
100944
100355
100234100953
400129
300206
200242
101149
100425
200229
100481
200317
900004
100008
200421
101318
700026
300237
200172
100349
100127
200433
200009
100647
100282
400203
500024
100891
100721
200142
100440
101163
300207
100943
101257
300101
400271
100171
100687
100815
100118
100541
100604
100781
100381
101200
200189
100977
200484
100700
200470
200413
200387
100554
100870
100525
400274
200136
101198
100452
400004
200269
100890
400272
300067
101263
300154
400259
200322
100688
100829
100405
200115
100301
100069
100934
100927
101190
100330
101346
100494
100149
300228
101326
101128
101132
101168
100887
100758
400053
100603
100082
101268
200249
200461
700000
200282
101161
300142
100789
100671
300058
100344
100606
100894
100410
100512
101204100922
101098
200133
600024
100519
101181100578
100306
100226
101058
500012
200261
100867
100659
101152
100624
200029
100928
100428
101144
300238
101284100842
101320
101398
400095
200033
200121
200458
100929
101239
101160
500039
100488
100828
100696
200059
400238
101276
100210
400249
400266
200019
400136
100205
101121
101117
200338
100036
300030
200173
200048
100971
200004
300075
101300
400022
100129
101421
101319
200228
200013
100563
200054
100655
400263
300141
100904
400202
100673
SERC*SiphoviridaeMyoviridaeMediterranean*PodoviridaeMicroviridaeInoviridaeLeviviridaeCystoviridaeOther/Unclassified
ProteobacteriaFimicutesActinobacteriaBacteroidetesCyanobacteriaCrenarchaeotaEuryarchaeotaTenericutesOther/Unclassified
Branches(Viral family)
Labels(Host phylum)
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
23
421
Fig. 3: Changes in OTU abundance in two sample groups 422
Approximate-maximum likelihood tree of OTUs that showed significant differences in 423
relative abundance between STEC positive and STEC negative cattle hide samples. 424
Branches show significance based on coloring by the p-value of a Mann-Whitney U test 425
examining changes in abundance between samples positive for STEC (p < 0.05 – red) and 426
samples negative for STEC, (p >= 0.05 – blue). Label color on a blue-green color gradient 427
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
24
highlights OTU occurrence based on the abundance ratio between STEC positive samples 428
(blue) and STEC negative samples (green). For example, labels that are darker green had a 429
higher abundance in STEC negative samples. Node luminosity represents overall 430
abundance with lighter nodes being less abundant than darker nodes. 431
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
25
432
Fig. 4: Comparing cattle fecal and hide samples and the abundance of Ruminococcaceae 433
Phylogeny based on UPGMA tree of pairwise unweighted UniFrac distance between 356 434
bacterial community profiles based on SSU rRNA amplicon sequences from cattle hide 435
and feces. Branches are colored by feces (blue) and hide (red). Rapid testing of the 436
hypothesis that the abundance of one of the most abundant families, Ruminococcaceae, 437
and sample origin are correlated is enabled through node label coloring by (A) a green 438
single-color gradient (color saturation increases with increasing abundance of 439
Ruminococcaceae OTUs) and (B) a light green (low abundance of Ruminococcaceae 440
OTUs) to dark blue (high abundance of Ruminococcaceae OTUs) color gradient. 441
442
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
26
443
Fig. 5: Temporal dynamics of virioplankton populations according to Cyano II RNR 444
amplicon phylogeny 445
An approximately-maximum-likelihood phylogenetic tree of 200 randomly selected class 446
II Cyano RNR representative sequences from 98% percent clusters. Iroki was used to color 447
branches by time point: zero hours – yellow, six hours – orange, and twelve hours – 448
purple. 449
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint
27
450
451
452
453
454
455
456
457
458
459
460
461
462
463
Fig. 6: Zetaproteobacteria show biogeographic partitioning 464
Phylogenetic tree showing placement of full-length Zetaproteobacteria SSU rRNA 465
sequences with outgroups. Iroki was used to color labels and branches by geographic 466
location of the sampling site (Loihi – green, Mariana – gold, South Pacific – purple, and 467
Other – gray), as well as to rename the nodes with OTU and sampling site metadata. 468
469
Loihi Mariana South Pacific Other
Location
0.04
Spha
erot
ilus n
atan
s
OTU
9 S. Pacific S49 OTU
2 Lo
ihi S
16
OTU12 Loihi S60
OTU
22 L
oihi
S77
OTU3 Other S26O
TU4 Loihi S29
OTU
9 Mariana S45
Hyphomonas jannasch
iana
OTU2
Loih
i S22
OTU
11 Loihi S58
Aquifex pyrophilus
Lept
othr
ix di
scop
hora
OTU
9 Mariana S46
OTU1 Loihi S8
Nitrospina gracilis
OTU26 S. Pacific S82
OTU
4 Loihi S31
OTU1 S. Pacific S11
OTU
2 Lo
ihi S
20
OTU
9 Other S47
OTU2 L
oihi S
21
OTU20 Mariana S74
OTU6 Loihi S40
OTU2 Loihi S
19
OTU
11 L
oihi
S56
Rhodobacter
gluco
nicum
OTU3 S. Pacific S27
OTU
2 Lo
ihi S
17
OTU1 Mariana S9
OTU
1 S.
Pac
ific
S13
OTU1 Loihi S7
Gallion
ella f
errug
inea
OTU18 Other S72
OTU1 Loihi S4
Thermotoga maritima
OTU15 Mariana S65
OTU
14 Loihi S63
OTU
5 S. Pacific S37
OTU6 Loihi S38
OTU2 S. Pac
ific S24
OTU7 Loihi S41
OTU1 Loihi S2O
TU4 S. Pacific S33
OTU8 S. Pacific S43OTU8 S. Pacific S44
OTU2 Other
S23
Sulfurovum lithotrophicum
OTU21 S. Pacific S75O
TU23
Oth
er S
79
OTU
10 Loihi S53
OTU1 Loihi S3
OTU17 S. Pacific S70
OTU9 Other S48
OTU
4 S. Pacific S36
Sulfurimonas autotrophica
Nitratiruptor tergarcus
OTU1 Loihi S6
OTU
16 S
. Pac
ific
S69
OTU
28 S
. Pac
ific
S84
Thio
thrix
disc
iform
is
OTU1 S. Pacific S10
OTU15 Mariana S67
OTU6 Loihi S39
OTU
4 S. Pacific S35
Thio
mic
rosp
ira p
sych
roph
ila
OTU1 Loihi S5
OTU
2 Lo
ihi S
14
OTU8 Mariana S42
OTU
22 L
oihi
S78
OTU
11 Other S59
OTU
11 Loihi S55
OTU
10 L
oihi
S51
OTU
16 S
. Pac
ific
S68
OTU18 Other S71OTU13 S. Pacific S62
OTU24 Other S80
OTU
10 L
oihi
S52
OTU3 S. Pacific S28
OTU1 Loihi S1
Mar
inob
acte
r aqu
aeol
ei OTU15 Mariana S66
OTU
25 S. Pacific S81
Desulfurella acetivorans
OTU
4 Loihi S30
OTU3 Loihi S25
Roseobact
er den
itrifican
s
OTU
27 S. Pacific S83
OTU
2 Lo
ihi S
15
OTU
2 Lo
ihi S
18
OTU21 S. Pacific S76
OTU
14 O
ther
S64
OTU
4 S. Pacific S34
OTU
11 Loihi S57
OTU
9 S. Pacific S50
OTU
19 L
oihi
S73
Hippea maritima
OTU
4 Loihi S32
OTU
1 S. Pacific S12
OTU
10 L
oihi
S54
OTU12 Loihi S61
.CC-BY 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted February 13, 2017. . https://doi.org/10.1101/106138doi: bioRxiv preprint