Download - Rachel Adams - SMBE Euks Meeting

Next-generational sequencing for microbial ecology:

alpha diversity, beta diversity, and biases in high-throughput sequencing

Rachel AdamsAndrew Rominger

Sara BrancoThomas Bruns

Understudied but fundamental ecological habitat

Implications for human healthSick building syndrome

Metrics are practically absent: composition and quantitative characteristics

Need comparison of “typical” buildings

The microbiome of the built environment

Understudied but fundamental ecological habitat

Implications for human healthSick building syndrome

Metrics are practically absent: composition and quantitative characteristics

Need comparison of “typical” buildings and high replication across settings to detect patterns

The microbiome of the built environment

?

?

?

The What and Why of the indoor microbiome

?

?

?Architecture

Ventilation

Building function


?

?

?Architecture

Ventilation

Building function Environmental setting


?

?

?Architecture

Ventilation

Building function Environmental setting

Residents


Fungi in the indoor microbiome, and beyond

Yeasts

Filaments


Yeasts

Filaments

Saprobes


Yeasts Saprobes

Symbionts

Parasites Mutualists

− +

Assessing environmental fungi

1. Estimated that 5-20% of fungi grow in culture2. Identification requires a fungal taxonomist

Assessing environmental fungi

SSU RNA (18S) (5.8S) LSU RNA (28S)

ITS1 ITS2

Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi - Schoch et al. 2012

High-throughput sequencing has greatly expanded capabilities in microbial ecology

ACGAGTGCGT


ACGAGTGCGTACGCTCGACA AGACGCACTC AGCACTGTAG ATCAGACACG

104 – 107 sequence reads


α1

β12

ϒ

α2 α3

β23

β13

alpha, beta, gamma diversity

α1

α2 α3


α1

β12

α2 α3

β23

β13


α1

β12

ϒ

α2 α3

β23

β13


Kunin et al. 2010

Groundtruthing high-throughput sequencing for alpha richness

Kunin et al. 2010

αtrue < αest

Groundtruthing high-throughput sequencing for alpha richness

Groundtruthing high-throughput sequencing

True samples

Hig

h-th

roug

hput

seq

uenc

ing

Observed samples

α1

α2 α3

α1+

α2+ α3+

In terms of diversity, we know that α

can be elevated in high-throughput sequenced communities...

True community

Observed community

β12 β13

β23

β12? β13?

β23?

α1

α2 α3

α1+

α2+ α3+

...but how does that change conclusions of ecological processes that are based on β diversity?

Hig

h-th

roug

hput

seq

uenc

ing

A key component to community ecology: Linking processes to this compositional variation

Adams et al., ISME Journal, 2013

Beta diversity: the variation in species composition among sites

Do errors that inflate alpha diversity bias conclusions on beta diversity between samples?

Why would it? • Particular taxa in one environment grouping do not amplify or

amplify in a way that skews relative abundance of all others*• Clustering incorrectly groups divergent taxa or splits identical

taxa

Hypothesis: No

While richness/diversity estimations will be off for any given sample, conclusions of beta-diversity will be robust to the errors

Question and hypotheses


Why would it? • Particular taxa in one environment grouping do not amplify or

amplify in a way that skews relative abundance of all others*• Clustering incorrectly groups divergent taxa or splits identical

taxa



Simulation process

Initial community

Simulated community

OTU1 OTU2 … OTUj

Sample 1

Sample 2

…

Sample i

OTU1 OTU2 … OTUk

Sample 1

Sample 2

…

Sample i

Simulation process

Expected relative abundance of OTUs

Initial communities

Simulation process

Biased relative abundance

Variation in taxon-specific amplification

Initial communities


Simulation process



Biased relative abundance + error

Sequence error

Initial communities


Simulation process




Sequence error

Clustering OTUs

Initial communities

Biased relative abundance + error + clustering


Simulation process




Sequence error

Biased relative abundance + error + clusteringClustering OTUs

Simulated communities

Initial communities


Model summary – 2 types of errors

1. Create group differences that aren’t there (Type I error)

-0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

True

NMDS1

NM

DS

2

-0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

Perceived

NMDS1

NM

DS

2

Model summary – 2 types of errors

2. Loose groups differences that are there (Type II error)

-0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

True

NMDS1

NM

DS

2

-0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

Perceived

NMDS1

NM

DS

2

Model summary output

1. Presence of bias: Statistical categorical differences

Groups R2 p-value

Location 0.02 0.34

Season 0.20 0.001

2. Degree of bias: percentage difference between true and simulated communities

(Simulated – True) True

= Normalized bias

Model summary output

1. Presence of bias: Statistical categorical differences

2. Degree of bias: percentage difference between true and simulated communities

(Simulated distance – True distance)True distance

= Normalized error

Morisita-Horn distance metric

Groups R2 p-value

Location 0.02 0.34

Season 0.20 0.001

Categorical differences are robust to high-throughput sequencing errors in alpha diversity, regardless of the underlying patterns of beta-diversity

The degree of bias is not affected by the underlying patterns of beta-diversity but dependent on community characteristics

Model findings

Model findings

Categorical differences are robust to high-throughput sequencing errors in alpha diversity, regardless of the underlying patterns of beta-diversity

The degree of bias is not affected by the underlying patterns of beta-diversity but dependent on community characteristics

True Simulated True Simulated

0.0

0.2

0.4

0.6

0.8

1.0

p v

alu

esNo groups Two groups

Model summary – Type I & II error

True Simulated True Simulated

0.0

0.2

0.4

0.6

0.8

1.0

p v

alu

esNo groups Two groups

Model summary – Type I & II error

Whether groups are different or the same will not be biased by inflated alpha diversity

Model summary – Degree of bias

Degree of bias will be affected by - the error rate of the platform and OTU- clustering- the gamma diversity of the environment- the precise shape of the species abundance

distribution

But not the relationship among samples

Increasing probability of sequencing error and over-splitting OTUs increases bias

1e-04 0.0334 0.0667 0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

No groups

Nor

mal

ized

err

or

1e-04 0.0334 0.0667 0.1

Two groups

Probability of splitting

Increasing OTU richness decreases bias

100 600 1100

0.0

0.2

0.4

0.6

0.8

Number of OTUs

Nor

mal

ized

err

or

Shape of species abundance distribution (SAD) affects bias

0 200 400 600 800 1000 1200

01

00

02

000

30

00

40

005

00

0

Rank

Ab

und

an

ce

Shape of species abundance distribution (SAD) affects bias

1.5 2.5 3.5

0.0

0.2

0.4

0.6

0.8

Increasing SAD variance

No

rmal

ized

err

or

As true community distance increases, degree of error decreases

0.65 0.70 0.75 0.80

0.2

0.3

0.4

0.5

0.6

True distance

No

rma

lize

d e

rro

r

Clustering is the main error-producing step

True Amplified Split

0.0

0.1

0.2

0.3

0.4

0.5

R^2

va

lue

sTwo groups

Simulation overview

Categorical analysis very robust to errors in high-throughput biases

Degree of bias will be affected by error rate of the sequencing platform and OTU-clustering, the gamma diversity of the environment, the precise shape of the species abundance distribution

High-throughput error leads to an over-estimation of the difference between groups

Mean bias is ~20-40%Incorrect OTU clustering is most of that

Steps

1. In silico: Add further complexity to simulations

2. In vitro: Empirically test artificially-created microbial communities


Why would it?

• Particular taxa in one environment grouping do not amplify or amplify in a way that skews relative abundance of all others*

• Clustering incorrectly groups divergent taxa or splits identical taxa

Hypothesis: No



Air samples in a mycology classroom: a unique source distorts perceived species richness

Mycology classroom appears to be less rich than other classrooms…

0 2000 4000 6000 8000

02

0040

060

080

010

00B

AC

D

E

Individuals

Cha

o E

stim

ated

Ric

hne

ss

… but has higher biomass

A B C D E

050

100

15

02

00

Classroom

Pe

nic

illiu

m s

pore

eq

uiva

lent

s

Composition of non-mycology classrooms are similar

AB

CD

E

Proportion

Cla

ssro

om

0 20 40 60 80 100

Mycology classroom dominated by a few taxa

AB

CD

E

Proportion

Cla

ssro

om

0 20 40 60 80 100

xxPuffballs dominate mycology classroom

Pisolithus, aka dog turd fungus Battarrea, tall stiltball

Lycoperdon, common puffball

Mycology classroom dominated by a few taxa

AB

CD

E

Proportion

Cla

ssro

om

0 20 40 60 80 100

* * **

Adams et al., in review

Beta diversity of mycology classroom: distinct communities

-1.5 -1.0 -0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

NMDS1

NM

DS

2Observed


-1.5 -1.0 -0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

NMDS1

NM

DS

2ObservedTaxonomy reassigned


-1.5 -1.0 -0.5 0.0 0.5

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

NMDS1

NM

DS

2ObservedTaxonomy reassignedAbundance reassigned

Conclusions

• While deciphering alpha diversity is problematic:- Inflated alpha due to sequence error & clustering- Deflated alpha due to unevenness

beta diversity calculations are robust to these errors in high-throughput sequencing

• Empirical test will be used to corroborate conclusions of in silico simulations

• High-throughput sequencing will continue to be a promising tool for microbial ecologists

References – potential biases in high-throughput sequencingDNA extraction: Frostegard et al Appl Environ Microbiol 1999; DeSantis et al FEMS Microbiology 2005; Feinsten et al Appl Environ Microbiol 2009; Morgan et al PLoS ONE 2010; Delmont et al Appl Environ Microbiol 2011

PCR amplification/Relative abundance: Amend et al Mol Ecol 2010; Engelbrektson et al ISME Journal 2010; Bellemain et al BMC Microbiol 2010; Schloss et al PLoS ONE 2011; Pinto & Raskin PLoS ONE 2012; Klindworth et al Nucleic Acids Res 2013

Sequencing error/Chimeras/OTU clustering: Huse et al Genome Biol 2007; Huse et al Environ Microbiol 2010; Kunin et al Environ Microbiol 2010; Quince et al BMC Bioinformatics 2010; Lee et al PLoS ONE 2012; Pinto & Raskin PLoS ONE 2012; Bachy et al ISME Journal 2013

Sequencing platform/protocol: Morgan et al PLoS ONE 2010; Luo et al PLoS ONE 2012

Even sampling depth: Schloss et al PLoS ONE 2011; Gihring et al Environ Microbiol 2012

Denoising: Gasper & Thomas PLoS ONE 2013;

Empirical test of simulation results

100 600 1100

0.0

0.2

0.4

0.6

0.8

Number of OTUs

Nor

mal

ized

err

or

PCR bias

-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.0

0.5

1.0

1.5

2.0

PCR bias: beta distribution a=0.5, beta=1.0

Scatter around line of true abundance versus amplified abundance

Den

sity

0 200 400 600 800 1000 1200

020

04

006

0080

010

00

1200

1400

True abundance

Am

plifi

ed a

bund

anc

e

OTU splitting bias

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

Split bias: binomial distribution with n=100

Number of splits

Den

sity

p=0.001

p=0.0667

p=0.0334

p=0.0001

0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Split location: beta distribution with a=b=0.5

Location of split

Den

sity