#ievobio Keynote - June 26, 2013
description
Transcript of #ievobio Keynote - June 26, 2013
Visualizing biodiversity in the era of high-throughput
sequencing
Holly Bik, UC Davis @Dr_Bik
Our ability to visualize high-throughput sequencing data is as
bad as my title slide
���
$250k, 1 year��
“A Research-Driven Data Visualization Framework for High-
Throughput Environmental Sequence Data” �
http://pitchinteractive.com @pitchinc
“Pitch Interactive dissects large data sets in search of meaningful and often hidden patterns that
serve to determine the shape and form that best tells a story.”
Diverse marine community!
EASY!EASY!
EASY!
VERY Difficult!!
Mark Rothko, �No. 14, 1960�
�rectangles of orange and
purple with soft edges ��
h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:
Challenge 1: Environmental data is terrible at revealing fine-scale
taxonomic patterns
ShallowGulf:
ShallowCalif:
AtlanAc22#1:AtlanAc25#2:
AtlanAc29:AtlanAc43: Pacific128:
Pacific528:Pacific422:
Pacific321:
Pacific237:AtlanAc45:
PC2:(12.21%):
PC3:(10.54%): PC1:(13.03%):
Overarching Community Patterns!
Bik et al. 2012, Molecular Ecology,! 21(5):1048-59 !
0:
0.1:
0.2:
0.3:
0.4:
0.5:
0.6:
0.7:
0.8:
0.9:
1:
Post-spill�
Fungal Dominance�
Nematode Dominance�Pre-spill�
Bik et al. 2012, PLoS ONE, 7(6):e38550 !
Algae:
Environmental:
Fungi:
Metazoa::Annelida:
Metazoa::Arthropoda:
Metazoa::Gastrotricha:
Metazoa::Nematoda:
Metazoa::Platyhelminthes:
No:Match:
Stramenopiles:
Unicellular:Eukaryotes:
Metazoa::Acanthocephala:
Metazoa::Brachiopoda:
Metazoa::Bryozoa:
Metazoa::Chordata:
Metazoa::Cnidaria:
Metazoa::Echiura:
Metazoa::Entoprocta:
Metazoa::Mollusca:
Fungi�
Grand&Isle,&Louisiana&:
Bik et al. 2012, PLoS ONE, 7(6):e38550 !
Exploring Trees�Ecologically, what are these reference taxa doing??!
Pertinent info for biological interpretations of DNA data!!!
Challenge 2: Taxonomic, phylogenetic, and ecological knowledge is imperative for
making meaningful interpretations of high-throughput sequence datasets
Enoplus spp.��
Daptonema spp.��
Robbea spp.��
Caenorhabditis elegans
Actinomyces spp.��
Clostridium spp.��
Listeria spp.
Synechococcus spp.
Challenge 3: Extreme bioinformatics bottleneck for
microbial eukaryote data
rDNA copy number & genome size in eukaryotes
Prokopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.
Bik et al., in revision
…and in ONE genus of nematodes
Caenorhabditis brenneri ~323 rRNA gene copies
Caenorhabditis briggsae ~56 rRNA gene copies
OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match
27 63 266 525 e-146 265 265 100 -1 B. seani 175
12 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 1751164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 1751223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175
Porazinska et al. 2010 Zootaxa
Intragenomic variation in Eukaryotic rRNA
Tail!
Head!
Artificial control community containing known nematode species, all with corresponding full length reference 18S sequences!
Head-Tail Pattern in Nematode OTUs
99% cutoff
OTUs as ‘Clouds’
97% cutoff
How to correlate OTUs with biological species?
Sparse Databases for Eukaryotes
SILVA&108&Ref&rRNA&Database&(16S/18S)&
Bacteria: 530,197:
Archaea: 25,658:
Eukaryotes: 62,587:
Ambiguous Taxonomy
Taxa Region 1 95%
Region 2 95%
Region 1 99%
Region 2 99%
Metazoa (20 Phyla) 1360 1461 43255 25668 Nematoda 765 879 27020 15518
Annelida 217 197 7073 3869 Arthropoda 128 178 2280 2323
Unicellular eukaryotes 738 1257 15198 22020 Environmental isolates 774 686 12687 9775 No match 480 354 11345 1868 Fungi 225 163 9984 2445 Stramenopiles 137 146 1771 1583 Algae 111 96 975 861 Total (all taxa) 3825 4163 95215 64220
!1!Deep sea and shallow water marine sediment 1.2 million reads, 454 GS FLX Titanium
Bik et al. 2012, Molecular Ecology, 21(5):1048-59
Goal 1: A web-based, scalable visualization framework for
standard data formats
Tier One
Standard outputs from bioinformatic pipelines
• BIOM (json) files – OTU tables, metagenome datasets • Tab-delimited metadata files
http://explore.climbsf.com
Goal 2: Destroy biologists’ addiction to pie charts
A pie chart is not the most informative way to interpret
biodiversity data!
Tier Two
Bacteria: Archaea:
Nematodes:
Cilliates:
Crustaceans:
Circle:size:=:species:abundance:Circle:color:=:metadata:(sample,:temprature,:pH,:etc.):Mockup:example:take:from:h"p://www.wefeelfine.org/::
Goal 4: Find intuitive ways to visualize new data outputs
Explicitly Phylogenetic Approaches!Aligned:environmental:sequences:
Guide:Tree:
EvoluAonary:Placement:of:short:reads:
:::::::::
http://phylosift.wordpress.com!
Input Sequences rRNA workflow
protein workflow
profile HMMs used to align candidates to reference alignment
Taxonomic Summaries
parallel option
hmmalign multiple alignment
LAST fast candidate search
pplacer phylogenetic placement
LAST fast candidate search
LAST fast candidate search
search input against references
hmmalign multiple alignment
hmmalign multiple alignment
Infernal multiple alignment
LAST fast candidate search
<600 bp
>600 bp
Sample Analysis & Comparison
Krona plots, Number of reads placed
for each marker gene
Edge PCA, Tree visualization, Bayes factor tests
each
inpu
t seq
uenc
e sc
anne
d ag
ains
t bot
h w
orkf
low
s
Probability Distributions: �when a pie chart is not a pie chart
Great! !
Not Bad !
Getting Tricky… !
Marine:Metagenome:
Tree:Placement:Sing:Tree:6:Guppy:
Goal 5: Pester other people Solicit case study participants
Goal 6: (Phase 2) Build a user and developer community
Acknowledgements :
:
Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu David Coil :
: Further Information
• @Dr_Bik – updates posted to Twitter
• Grant proposal now posted on Figshare!
!!!: