#ievobio Keynote - June 26, 2013

45
Visualizing biodiversity in the era of high-throughput sequencing Holly Bik, UC Davis @Dr_Bik

description

 

Transcript of #ievobio Keynote - June 26, 2013

Page 1: #ievobio Keynote - June 26, 2013

Visualizing biodiversity in the era of high-throughput

sequencing

Holly Bik, UC Davis @Dr_Bik

Page 2: #ievobio Keynote - June 26, 2013

Our ability to visualize high-throughput sequencing data is as

bad as my title slide

Page 3: #ievobio Keynote - June 26, 2013

���

$250k, 1 year��

“A Research-Driven Data Visualization Framework for High-

Throughput Environmental Sequence Data” �

Page 4: #ievobio Keynote - June 26, 2013

http://pitchinteractive.com @pitchinc

Page 5: #ievobio Keynote - June 26, 2013

“Pitch Interactive dissects large data sets in search of meaningful and often hidden patterns that

serve to determine the shape and form that best tells a story.”

Page 6: #ievobio Keynote - June 26, 2013

Diverse marine community!

EASY!EASY!

EASY!

VERY Difficult!!

Page 7: #ievobio Keynote - June 26, 2013
Page 8: #ievobio Keynote - June 26, 2013

Mark Rothko, �No. 14, 1960�

�rectangles of orange and

purple with soft edges ��

Page 9: #ievobio Keynote - June 26, 2013

h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:

Page 10: #ievobio Keynote - June 26, 2013

Challenge 1: Environmental data is terrible at revealing fine-scale

taxonomic patterns

Page 11: #ievobio Keynote - June 26, 2013

ShallowGulf:

ShallowCalif:

AtlanAc22#1:AtlanAc25#2:

AtlanAc29:AtlanAc43: Pacific128:

Pacific528:Pacific422:

Pacific321:

Pacific237:AtlanAc45:

PC2:(12.21%):

PC3:(10.54%): PC1:(13.03%):

Overarching Community Patterns!

Bik et al. 2012, Molecular Ecology,! 21(5):1048-59 !

Page 12: #ievobio Keynote - June 26, 2013

0:

0.1:

0.2:

0.3:

0.4:

0.5:

0.6:

0.7:

0.8:

0.9:

1:

Post-spill�

Fungal Dominance�

Nematode Dominance�Pre-spill�

Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Page 13: #ievobio Keynote - June 26, 2013

Algae:

Environmental:

Fungi:

Metazoa::Annelida:

Metazoa::Arthropoda:

Metazoa::Gastrotricha:

Metazoa::Nematoda:

Metazoa::Platyhelminthes:

No:Match:

Stramenopiles:

Unicellular:Eukaryotes:

Metazoa::Acanthocephala:

Metazoa::Brachiopoda:

Metazoa::Bryozoa:

Metazoa::Chordata:

Metazoa::Cnidaria:

Metazoa::Echiura:

Metazoa::Entoprocta:

Metazoa::Mollusca:

Fungi�

Grand&Isle,&Louisiana&:

Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Page 14: #ievobio Keynote - June 26, 2013

Exploring Trees�Ecologically, what are these reference taxa doing??!

Page 15: #ievobio Keynote - June 26, 2013

Pertinent info for biological interpretations of DNA data!!!

Page 16: #ievobio Keynote - June 26, 2013

Challenge 2: Taxonomic, phylogenetic, and ecological knowledge is imperative for

making meaningful interpretations of high-throughput sequence datasets

Page 17: #ievobio Keynote - June 26, 2013

Enoplus spp.��

Daptonema spp.��

Robbea spp.��

Caenorhabditis elegans

Actinomyces spp.��

Clostridium spp.��

Listeria spp.

Synechococcus spp.

Page 18: #ievobio Keynote - June 26, 2013

Challenge 3: Extreme bioinformatics bottleneck for

microbial eukaryote data

Page 19: #ievobio Keynote - June 26, 2013

rDNA copy number & genome size in eukaryotes

Prokopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.

Page 20: #ievobio Keynote - June 26, 2013

Bik et al., in revision

…and in ONE genus of nematodes

Caenorhabditis brenneri ~323 rRNA gene copies

Caenorhabditis briggsae ~56 rRNA gene copies

Page 21: #ievobio Keynote - June 26, 2013

OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match

27 63 266 525 e-146 265 265 100 -1 B. seani 175

12 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 1751164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 1751223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175

Porazinska et al. 2010 Zootaxa

Intragenomic variation in Eukaryotic rRNA

Tail!

Head!

Artificial control community containing known nematode species, all with corresponding full length reference 18S sequences!

Head-Tail Pattern in Nematode OTUs

Page 22: #ievobio Keynote - June 26, 2013

99% cutoff

OTUs as ‘Clouds’

97% cutoff

How to correlate OTUs with biological species?

Page 23: #ievobio Keynote - June 26, 2013

Sparse Databases for Eukaryotes

SILVA&108&Ref&rRNA&Database&(16S/18S)&

Bacteria: 530,197:

Archaea: 25,658:

Eukaryotes: 62,587:

Page 24: #ievobio Keynote - June 26, 2013

Ambiguous Taxonomy

Taxa Region 1 95%

Region 2 95%

Region 1 99%

Region 2 99%

Metazoa (20 Phyla) 1360 1461 43255 25668 Nematoda 765 879 27020 15518

Annelida 217 197 7073 3869 Arthropoda 128 178 2280 2323

Unicellular eukaryotes 738 1257 15198 22020 Environmental isolates 774 686 12687 9775 No match 480 354 11345 1868 Fungi 225 163 9984 2445 Stramenopiles 137 146 1771 1583 Algae 111 96 975 861 Total (all taxa) 3825 4163 95215 64220

!1!Deep sea and shallow water marine sediment 1.2 million reads, 454 GS FLX Titanium

Bik et al. 2012, Molecular Ecology, 21(5):1048-59

Page 25: #ievobio Keynote - June 26, 2013

Goal 1: A web-based, scalable visualization framework for

standard data formats

Page 26: #ievobio Keynote - June 26, 2013

Tier One

Standard outputs from bioinformatic pipelines

Page 27: #ievobio Keynote - June 26, 2013

•  BIOM (json) files – OTU tables, metagenome datasets •  Tab-delimited metadata files

Page 28: #ievobio Keynote - June 26, 2013
Page 29: #ievobio Keynote - June 26, 2013

http://explore.climbsf.com

Page 30: #ievobio Keynote - June 26, 2013

Goal 2: Destroy biologists’ addiction to pie charts

Page 31: #ievobio Keynote - June 26, 2013

A pie chart is not the most informative way to interpret

biodiversity data!

Page 32: #ievobio Keynote - June 26, 2013

Tier Two

Page 33: #ievobio Keynote - June 26, 2013

Bacteria: Archaea:

Nematodes:

Cilliates:

Crustaceans:

Circle:size:=:species:abundance:Circle:color:=:metadata:(sample,:temprature,:pH,:etc.):Mockup:example:take:from:h"p://www.wefeelfine.org/::

Page 34: #ievobio Keynote - June 26, 2013

Goal 4: Find intuitive ways to visualize new data outputs

Page 35: #ievobio Keynote - June 26, 2013

Explicitly Phylogenetic Approaches!Aligned:environmental:sequences:

Guide:Tree:

EvoluAonary:Placement:of:short:reads:

:::::::::

Page 36: #ievobio Keynote - June 26, 2013

http://phylosift.wordpress.com!

Page 37: #ievobio Keynote - June 26, 2013

Input Sequences rRNA workflow

protein workflow

profile HMMs used to align candidates to reference alignment

Taxonomic Summaries

parallel option

hmmalign multiple alignment

LAST fast candidate search

pplacer phylogenetic placement

LAST fast candidate search

LAST fast candidate search

search input against references

hmmalign multiple alignment

hmmalign multiple alignment

Infernal multiple alignment

LAST fast candidate search

<600 bp

>600 bp

Sample Analysis & Comparison

Krona plots, Number of reads placed

for each marker gene

Edge PCA, Tree visualization, Bayes factor tests

each

inpu

t seq

uenc

e sc

anne

d ag

ains

t bot

h w

orkf

low

s

Page 38: #ievobio Keynote - June 26, 2013

Probability Distributions: �when a pie chart is not a pie chart

Page 39: #ievobio Keynote - June 26, 2013
Page 40: #ievobio Keynote - June 26, 2013

Great! !

Not Bad !

Getting Tricky… !

Page 41: #ievobio Keynote - June 26, 2013

Marine:Metagenome:

Tree:Placement:Sing:Tree:6:Guppy:

Page 42: #ievobio Keynote - June 26, 2013

Goal 5: Pester other people Solicit case study participants

Page 43: #ievobio Keynote - June 26, 2013
Page 44: #ievobio Keynote - June 26, 2013

Goal 6: (Phase 2) Build a user and developer community

Page 45: #ievobio Keynote - June 26, 2013

Acknowledgements :

:

Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu David Coil :

: Further Information

•  [email protected]

•  @Dr_Bik – updates posted to Twitter

•  Grant proposal now posted on Figshare!

!!!: