Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions...

13
Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan 1 , Stephen Woodcock 1 , Mary Lunn 2 , Ian M. Head 3 and Thomas P. Curtis 3 (1) Department of Civil Engineering, University of Glasgow, Oakfield Avenue, Glasgow G12 8LT, UK (2) Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK (3) School of Civil Engineering and Geosciences, University of Newcastle upon Tyne, Newcastle NE1 7RU, UK Received: 22 June 2006 / Accepted: 10 July 2006 / Online publication: 13 December 2006 Abstract We show that inferring the taxa-abundance distribution of a microbial community from small environmental sam- ples alone is difficult. The difficulty stems from the disparity in scale between the number of genetic sequences that can be characterized and the number of individuals in communities that microbial ecologists aspire to describe. One solution is to calibrate and validate a mathematical model of microbial community assembly using the small samples and use the model to extrapolate to the taxa- abundance distribution for the population that is deemed to constitute a community. We demonstrate this ap- proach by using a simple neutral community assembly model in which random immigrations, births, and deaths determine the relative abundance of taxa in a community. In doing so, we further develop a neutral theory to produce a taxa-abundance distribution for large commu- nities that are typical of microbial communities. In addition, we highlight that the sampling uncertainties conspire to make the immigration rate calibrated on the basis of small samples very much higher than the true immigration rate. This scale dependence of model parameters is not unique to neutral theories; it is a generic problem in ecology that is particularly acute in microbial ecology. We argue that to overcome this, so that microbial ecologists can characterize large microbial communities from small samples, mathematical models that encapsu- late sampling effects are required. Introduction Characterizing large microbial communities from sparse genomic data requires some mathematical model of a pattern in community structure. Identifying such models remains one of the greatest challenges in microbial ecology. One approach, perhaps the ideal one, is to target particular communities and conduct very intensive surveys that will yield sufficient data for patterns in community structure to become readily apparent or for empirical models to be teased out of the data in statistical analyses. However, this approach is speculative and, because we currently do not know what constitutes Bsufficient data[, such surveys are difficult to rationally plan or cost [6]. The alternative is to postulate a mathematical model in advance of a survey and tailor the sampling accordingly. This approach is also specula- tive in that the onus is on the microbial ecologist to lay down a priori theoretical conjectures, often backed by little more than intuition, which can be translated into mathematical models. The latter approach has yielded success. The observation of power–law taxa–area rela- tionships in bacteria [2, 12] and microbial eukaryotes [10] is a major breakthrough in microbial ecology, which did not arise by chance. Researchers reasoned that one of the well-known ecological relationships observed with macroorganisms may also apply to microorganisms and tailored an experimental program to test this hypothesis. These studies, like all others in microbial ecology, are made difficult because even using the most up-to-date genomic approaches, we are limited to analyzing a small fraction of the genes in very small environmental samples [28]. The disparity between sample and community size is enormous and far exceeds that for macroorganisms. Take, for example, a 10-g sample of soil; this can comprise as many as 10 10 individual microorganisms (approximately the human population of the world), clone libraries generated from soil samples typically represent a random sample of tens to a couple of hundred individuals. Intuitively, such a small sample has the potential to distort our view of the large Correspondence to: William T. Sloan; E-mail: [email protected] DOI: 10.1007/s00248-006-9141-x & Volume 53, 443–455 (2007) & * Springer Science + Business Media, Inc. 2006 443

Transcript of Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions...

Page 1: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

MicrobialEcology

Modeling Taxa-Abundance Distributions in Microbial Communitiesusing Environmental Sequence Data

William T. Sloan1, Stephen Woodcock1, Mary Lunn2, Ian M. Head3 and Thomas P. Curtis3

(1) Department of Civil Engineering, University of Glasgow, Oakfield Avenue, Glasgow G12 8LT, UK(2) Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK(3) School of Civil Engineering and Geosciences, University of Newcastle upon Tyne, Newcastle NE1 7RU, UK

Received: 22 June 2006 / Accepted: 10 July 2006 / Online publication: 13 December 2006

Abstract

We show that inferring the taxa-abundance distribution ofa microbial community from small environmental sam-ples alone is difficult. The difficulty stems from thedisparity in scale between the number of genetic sequencesthat can be characterized and the number of individuals incommunities that microbial ecologists aspire to describe.One solution is to calibrate and validate a mathematicalmodel of microbial community assembly using the smallsamples and use the model to extrapolate to the taxa-abundance distribution for the population that is deemedto constitute a community. We demonstrate this ap-proach by using a simple neutral community assemblymodel in which random immigrations, births, and deathsdetermine the relative abundance of taxa in a community.In doing so, we further develop a neutral theory toproduce a taxa-abundance distribution for large commu-nities that are typical of microbial communities. Inaddition, we highlight that the sampling uncertaintiesconspire to make the immigration rate calibrated on thebasis of small samples very much higher than the trueimmigration rate. This scale dependence of modelparameters is not unique to neutral theories; it is a genericproblem in ecology that is particularly acute in microbialecology. We argue that to overcome this, so that microbialecologists can characterize large microbial communitiesfrom small samples, mathematical models that encapsu-late sampling effects are required.

Introduction

Characterizing large microbial communities from sparsegenomic data requires some mathematical model of a

pattern in community structure. Identifying such modelsremains one of the greatest challenges in microbialecology. One approach, perhaps the ideal one, is totarget particular communities and conduct very intensivesurveys that will yield sufficient data for patterns incommunity structure to become readily apparent or forempirical models to be teased out of the data in statisticalanalyses. However, this approach is speculative and,because we currently do not know what constitutesBsufficient data[, such surveys are difficult to rationallyplan or cost [6]. The alternative is to postulate amathematical model in advance of a survey and tailorthe sampling accordingly. This approach is also specula-tive in that the onus is on the microbial ecologist to laydown a priori theoretical conjectures, often backed bylittle more than intuition, which can be translated intomathematical models. The latter approach has yieldedsuccess. The observation of power–law taxa–area rela-tionships in bacteria [2, 12] and microbial eukaryotes[10] is a major breakthrough in microbial ecology, whichdid not arise by chance. Researchers reasoned that one ofthe well-known ecological relationships observed withmacroorganisms may also apply to microorganisms andtailored an experimental program to test this hypothesis.These studies, like all others in microbial ecology, aremade difficult because even using the most up-to-dategenomic approaches, we are limited to analyzing a smallfraction of the genes in very small environmental samples[28]. The disparity between sample and community sizeis enormous and far exceeds that for macroorganisms.Take, for example, a 10-g sample of soil; this cancomprise as many as 1010 individual microorganisms(approximately the human population of the world),clone libraries generated from soil samples typicallyrepresent a random sample of tens to a couple ofhundred individuals. Intuitively, such a small samplehas the potential to distort our view of the largeCorrespondence to: William T. Sloan; E-mail: [email protected]

DOI: 10.1007/s00248-006-9141-x & Volume 53, 443–455 (2007) & * Springer Science + Business Media, Inc. 2006 443

Page 2: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

population. Although microbial ecologists are well awareof this disparity of scale, it does not routinely affect theway genomics data are interpreted. Taxa-abundancedistributions, for example, are used to characterizemicrobial community structure, but what they actuallycharacterize is the distribution of taxa abundances in avery small sample. The disparity of scale between samplesand the communities they aim to represent, means thatthe sample and community distributions can be verydifferent indeed. To demonstrate this, we have derivedthe sample distribution for 200 individuals (equivalent toclones in a 16S rRNA gene library) selected at randomfrom large populations (1012 individuals) in which taxaabundances are distributed in four different ways (Fig. 1)two of which have been previously proposed as plausibletheoretical distributions [5, 9] (Fig. 1a,b) and two ofwhich have no biological basis and could be consideredridiculous (Fig. 1c,d). All the sample distributions have avery similar shape that is redolent of the distribution oftaxa abundances in real16S rRNA gene clone libraries.Thus, for example, the fact that clone abundancedistributions look like the tail end of a lognormaldistribution does not mean that the taxa in the largercommunity are distributed lognormally (although theymight be). Descriptors such as diversity indices, taxa-abundance distributions, and similarity indices have theirroots in ecology of macroorganisms, which are easier toobserve, and rely on a fairly complete census of theorganisms at a particular site. Figure 1 demonstrates thatfor microbial communities these descriptors may differsignificantly between sample and community. Molecularmethods are rapidly evolving and will offer partialsolutions. Thus, when very high throughput sequencingbecomes routinely available to microbial ecologist, acomplete census of a sample may become possible.However, improved molecular methods will only takeus so far; the number of individuals in samples, evenwhen we can identify all of them, will still be very smallin comparison to those in the microbial communities asa whole. It will always be necessary to infer larger-scaledescriptors of community structure from very smallsamples, which requires consideration of samplingeffects. Nonetheless, the fact that patterns exist in thecommon taxa suggests generic patterns that might extenddeeper into the community.

Deriving Patterns from a Modelof Community Assembly

How then does one sensibly postulate patterns andmathematical models to describe microbial communitiesthat apply to the whole community, not just theabundant taxa? The rationale that is proposed hererecognizes that all patterns in community structurederive from the processes of community assembly. Thus,

it is the balance between the evolutionary and ecologicalprocesses of speciation, environmental selection, dispers-al (or immigration), and local competition that shape thecommunity structure. Therefore, if one can quantifythese processes by using small samples and commontaxa, and then by assuming that the processes act uponthe whole community, it may be possible to extrapolatepatterns such as taxa–area relationships, and simplemodels of them (e.g., power–law relationships), inwhole-community structure. By embarking on such astrategy, microbial ecologists will enter the debate andcontroversies that are being played out in theoreticalecology. Verifying that a particular model holds true mayrequire extensive surveys and adaptations of currentgenomic and metagenomic methods. However, with aquantitative hypothesis to test, there is little doubt thatthe ingenuity of molecular microbial ecologists willprevail and we can begin to test candidate models anditerate toward a predictive theoretical microbial ecology.

The rationale, therefore, is straightforward. Imple-menting it, however, requires the derivation of a process-based mathematical model of microbial communityassembly, which is less straightforward. Our observationson microbial communities are so sparse that we cannotyet aspire to test some of the more subtle hypotheses onthe balance of ecological forces that shape communities ofmacroorganisms. A model that encapsulates every processthat is known to affect the structure of complex, diverse,and densely inhabited microbial communities would berendered useless, in all but the most abstract of analyses,by our inability to parameterize it. Therefore, pragmatismdictates that a model should employ prudent simplifica-tion with a view to explaining some of the communitystructure. Here we demonstrate our rationale by using avery simple conceptual model based on a few fundamen-tal truths that occur in any open biological community:organisms multiply, die, immigrate, and emigrate, andsome taxa become extinct whereas other, new ones,invade. MacArthur and Wilson [17] prudently pareddown their model of community assembly in the theoryof Island Biogeography to encapsulate these processesalone. Their intuition that such a simple model couldexplain the variance in diversity in insular communitiesof macroorganisms has been borne out by its central andenduring position in macroecological theory. Impor-tantly, because, in the balance between immigration andextinction, their model proffers a mechanism for main-taining, enhancing, or depleting diversity, it has beenused in conservation ecology as a predictive tool fordesigning nature reserves [11]. In so many applicationsof microbial ecology, the ultimate goal is to manipulatemicrobial communities to our advantage. This requiresknowledge of how patterns in communities form, notjust an observation of their existence. Thus, a reasonableconjecture is that a model based on the same processes as

444 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 3: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

the theory of Island Biogeography might explain patternscurrently observed in microbial communities, andpredict patterns that might be observed in the future.However, the theory of Island Biogeography makespredictions on the total diversity of a community andtherefore cannot be parameterized by using data fromsmall microbial samples typically used to characterizemicrobial communities and patterns in the abundance ofcommon taxa. A route to applying the principles of the

theory of Island Biogeography to microbial communitieswas offered by the recent neutral community models(NCMs) of Hubbell [14] and Bell [1], which extend thetheory to make predictions on the relative abundance oftaxa in the community, not just the diversity. For theseto be applied to very large communities and parameter-ized using data collected by molecular microbial ecolo-gists required two adaptations that we have presented in[23]. First, the mathematics of the original NCMs is

a)

0 10 20 30 400

100

200

300

400

500

Log2(Abundance)

Den

sity

of T

axa

LognormalDistribution

NT = 1012

NT

/Nmax

= 5

Diversity = 4290

0 5 10 15 200

10

20

30

40

Abundance

Num

ber

of T

axa

b)

0 10 20 30 400

20

40

60

Log2(Abundance)

Den

sity

of T

axa

Logseries NT = 1012

θ = 50Diversity = 105

0 5 10 15 200

10

20

30

40

50

Abundance

Num

ber

of T

axa

c)

20 25 30 35 400

50

100

150

200

Log2(Abundance) Discrete Scale

Num

ber

of T

axa

NT = 1012

200 equally abundanttaxa

0 5 10 15 200

20

40

60

80

Abundance

Num

ber

of T

axa

d)

0 10 20 30 400

500

1000

Log2(Abundance)

Den

sity

of T

axa

BimodalDistribution

NT = 1012

Diversity = 1000

0 5 10 15 200

20

40

60

80

Abundance

Num

ber

of T

axa

Figure 1. Distribution of taxa abun-dances in communities of 10

12

indi-viduals and in small samples of 200individuals from them for: (a) alognormally distributed community(NT/Nmax is the ratio of the totalnumber of individuals to the numberof individuals belonging to the mostabundant taxon, which can be usedto index richness [5]); (b) a logseriesdistributed community (q is one ofthe parameters of the lognormal thatcan be used as an index to speciesrichness [14]; (c) a community where200 taxa are equally abundant; (d) abimodal distribution.

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 445

Page 4: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

discrete, which means every birth, death, and immigrationevent in the assembly of a community is represented. Thisbecomes impractical in very large populations andtherefore a continuous mathematical model was devel-oped. Second, the published methods for calibrating the

model rely on an almost complete description of the taxa-abundance distribution for a community, which do notexist for microbial communities in any natural environ-ment. Therefore, a method was developed for calibratingthe model using the small-sample taxa-abundance distri-butions that are typically collated by using molecularapproaches to microbial community analysis. This meth-od, which relies on multiple samples, was applied tosuccessfully calibrate the model using published data forfunctional genes, conserved 16S rRNA sequences, andgroups of organisms in wastewater treatment plants,estuaries, lakes, and the human lung (Fig. 2).

The evidence presented by Sloan et al. [23] issufficient to suggest that the continuous NCM is morethat just conjecture and that it could potentially informus of larger-scale patterns in microbial communitystructure. We do not suggest that NCMs are the onlycandidate models for community assembly. We merelypropose that the evidence warrants their promotion to atheory that deserves further investigation. However, todo this requires a fuller description of the localcommunity structure than has previously been pub-lished. In the work of Sloan et al. [23], the model isdefined for a single taxon embedded in a neutralcommunity. To infer the whole community structurerequires this to be extended to a tractable mathematicaldescription of the taxa-abundance distribution for alltaxa within a functional group. The main result of thisarticle is a generic description for the expected taxa-abundance distribution for a microbial communityundergoing neutral dynamics. An analysis of the sensi-tivity of taxa-abundance distributions to changes inimmigration and local population size serves to reinforcethe importance of these variables and chance in shapingcommunity structure. In particular, high immigrationrates promote diversity, whereas low immigration ratesdeplete diversity and promote the dominance of com-

a)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5

Mean Relative Abundance (pi)

Fre

qu

ency

b)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6

Mean Relative Abundance (pi)

Fre

qu

ency

c)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4

Mean proportion of signal detected

Fre

qu

ency

d)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4

Mean Relative abundance (pi)

Fre

qu

ency

Figure 2. Comparing the theoretical and observed relationshipbetween the mean relative abundance of a taxa, pi, and thefrequency with which it appears in a fixed population size. Each ofthe points represents a different taxa. (a) Clone libraries ofdifferent the ammonia monooxygenase (AMO) genes at 13different domestic sewage works [26], m = 0.1, (b) Clone librariesof different ammonia oxidizing bacteria 16S RNA genes at six sitesfrom the Humber Estuary [16], m = 0.7, (c) 16S RNA sequencesfor 16 different bacterial taxa that are considered to be particularto freshwater environments sampled from 96 different lakes [29].Before the analysis, we removed data that represented threecyanobacterial lineages, leaving only data from putative hetero-trophs, and expressed proportional abundance as a fraction of theoverall noncyanobacterial abundance. The lowest relative abun-dance detected in a single analysis, 1/480, was used to define thedetection limit of the technique, NTm = 1.36. (d) Clones from thelungs of 24 patients with and without asthma (Wardlaw and Barer,personal communication) NTm = 14.6.

446 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 5: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

mon taxa. The analysis also highlights the scaledependence of model parameters, and that furtheradvances are required in the description of how localcommunities aggregate to form the source communitybefore large-scale patterns in community structure canbe inferred.

Taxa-Abundance Distribution for NeutrallyAssembled Microbial Communities

The conceptual model that underpins the NCM is verysimilar to that of the theory of Island Biogeography:there is some larger community (mainland) that acts as asource of immigrants to a local community (island) andit is the balance between immigrations and localextinctions that determines community structure. Withthe NCM, the local community is assumed to besaturated with NT individuals. For the assemblage oforganisms in the local community to change, anindividual must die or leave the system, which occursat random with a rate d. It is then immediately replacedby an individual either by reproduction from within thecommunity or an immigrant selected at random from asource community comprising n different taxa withabundances pif g

ni¼1. The probability, m, that a vacancy

in the local community is filled by an immigrant is ataxon-independent constant. There is an assumptionthat all vacancies are filled by reproduction or immi-gration and, therefore, the model does not currentlyaccommodate a population of previously dormantpores suddenly becoming active. Using only pif gn

i¼1, mand NT simple expressions can be derived to track thelikely change in abundance of any taxon in the localcommunity through time. For example, consider theprobability that the abundances of the ith taxon, Ni, inthe community increases by one individual in a timeperiod 1/d. This first requires a death/removal of anindividual belonging to some other taxon, whichoccurs with probability 1� Ni

NT. Then, either an individ-

ual from the ith taxon in the source community has tomigrate into the vacant space, which has probabilitympi, or there is no immigration but instead a localreproduction of the ith taxon in the local community,which occurs with probability 1� mð Þ Ni

NT�1. Thus theprobability that the abundance of the ith taxonincreases by one individual is simply given by,

Pr Ni þ 1=Nið Þ

¼ 1� Ni

NT

� �mpi þ 1�mð Þ Ni

NT � 1

� �� �: ð1Þ

Similar expressions can be derived for a decease and nochange in Ni and these form the basis of Hubbell’sdiscrete NCM (Appendix) and our continuous versionof it [23] for large microbial populations. The adjective

Bneutral[ derives from the fact that the probability of areproduction, Ni

NT�1, is solely dependent on the relativeabundance of taxa. Thus density-dependent growth isrepresented in the model but the specific growth ratesof all taxa are equivalent. This assumption will beviolated over short-time periods and is controversial[7]. However, it has been demonstrated, by allowingtaxa to have differentiated specific growth rates, thatprovided there is a constant stream of immigrants themodel is robust to modest departure from the purelyneutral assumption [23].

To describe the whole-community taxa-abundancedistribution for a local community requires a substantialextension of our previous mathematical (not conceptual)description of the NCM. In Sloan et al.’s study [23] themodel describes how the marginal probability dis-tributions of the abundance of taxa change through time.To infer the whole community structure requires adescription of how the joint probability distribution,� x1; x2;:::; xn�1ð Þ, for the abundance of taxa, changes

a)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5

Mean Relative Abundance (pi)

Fre

qu

ency

b)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6

Mean Relative Abundance (pi)

Fre

qu

ency

Figure 3. Relationship between the mean relative abundance (pi)and the detection frequency: Í observed in clone libraries; —given by the neutral model; –– –– assuming that the abundance ofthe ith taxon in a community of size NT is distributed binomiallyPr K ¼ kð Þ ¼

�NT

k

�pk

i 1� pið ÞNT�k . (a) For AMO genes at 13 differentdomestic sewage works [26] with immigration probability in theneutral model, m = 0.1 (b) AOB 16S RNA genes at six sites from theHumber Estuary [16], with immigration probability in the neutralmodel, m = 0.7.

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 447

Page 6: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

through time, which we show in Mathematical Appen-dix is governed by,

@�

@t¼Xn

i¼1

� @ M�xi�ð Þ

@xiþ 1

2

@2 V�xi�ð Þ

@x2i

� �

þ 1

2

Xn

i¼1

Xj6¼i

@2 C�xi�xj�

� �@xi@xj

ð2Þ

where xi is the relative abundance of the ith taxon andM�xi

;V�xiand C�xi�xj

are simple functions of our modelparameters pif gn

i¼1, m and NT. If the community is inBlong-term[ dynamic equilibrium, then the joint prob-ability distribution will not change through time andthis stationary distribution is described by the solutionof Eq. (1) with @�

@t ¼ 0 and boundary conditions @�@xi¼ 0,

where xi = 0 or xi = 1. It is shown in MathematicalAppendix that this is a Dirichlet distribution,

x1;:::; xn � Dir NTmp1;:::;NTmpnð Þ ð3Þ

The taxa-abundance distribution can easily be derivedfrom this by using a simple algorithm for simulatingrealizations of the abundance for each taxon (Mathe-matical Appendix).

Other authors have made significant advances onHubbell’s original discrete NCM [13, 20, 24, 25]. Ourformulation is simple and we have used well-establishedtechniques to convert Hubbell’s discrete model to acontinuous diffusion equation. This has the advantage ofmaking available literature from population genetics andstochastic modeling to those who wish to predict thedynamics of microbial communities. Previously publisheddescriptions of the taxa-abundance distribution for NCMs(e.g., [20, 25]) have been ingenious, but idiosyncratic,which makes them difficult to work with and their generalproperties obscure. We have shown that the jointprobability distribution is, in fact, described by the well-known Dirichlet distribution for which there is a wealth ofstatistical literature [4]. One of the greatest advantages ofusing our formulation in the context of microbialcommunities is that it does not presuppose any particulardistribution for the abundance of taxa in the sourcecommunity; it is defined for an arbitrary distributionpif gn

i¼1. Previous analytic descriptions of the taxa-abun-dance distribution predicted by an NCM have assumedthat the distribution of taxa abundances in the sourcecommunity is logseries—characterized by a single pa-rameter, q, which Hubbell calls the Bfundamentalbiodiversity number.[ q is an index of diversity; thelarger the q value for a functional group, the morediverse it is. However, this logseries assumption is basedon Hubbell’s model for the source community, whichassumes neutral dynamics where biodiversity is main-

tained at equilibrium through speciation rather thanimmigration. New species appear in the population likerare point mutations; they may spread and becomemore abundant, or more often, die out quickly. Thealternative drivers of speciation that are known to existfor microorganisms, such as lateral gene transfer, mustcall into question this conceptual picture of the sourcecommunity. Deriving a mathematical description of thesource community by aggregating local communities insome way that can be tested using genomic data remainsas a significant and exciting challenge. Therefore, theflexibility of being able to condition the predicted localcommunity taxa-abundance distribution on any as-sumed source distribution is a distinct advantage ofour formulation.

What Can We Infer about Prokaryote CommunityStructure from Small Samples?

How then does our derivation of the taxa-abundancedistribution help to infer patterns in microbial commu-nities? This is perhaps best explained by considering twoof the example data sets displayed in Fig. 2 in more detail:the clone libraries of ammonia monooxygenase AMOgenes [21, 26] from 13 different sewage works inGermany; and the ammonia-oxidizing bacteria 16SrRNA gene data from six samples at three differentsites in the Humber estuary in England [16]. Onaverage, 13 clones were sampled from each of the sewagework samples and exactly 20 were sampled for theestuary samples. As argued previously, this is a smallsample from which to draw conclusions on the commu-nity structure at any one site. However, using thetechnique reported by Sloan et al. [23], it is possible tocalibrate the NCM based on the distribution of taxaabundance across the 13 sewage works or six estuarysamples for the common taxa. If the average relativeabundance of a taxon is pi then, if neutral, its relativeabundance in any one of the samples, xi, is betadistributed; xi � Beta xi : NTmpi ;NTm 1� pið Þ

� �. Here,

we are assuming that the small random sample ofclones constitutes the community; therefore, NT is theaverage number of clones and m is the immigrationprobability as previously defined. Knowing the theoret-ical probability density distribution for the ith taxonallows us to calculate the probability that the taxonexists at an abundance greater than the detection limitof whatever molecular method is being used. It issimply

Pr xi > dð Þ ¼Z1

d

Beta xi : NTmpi;NTm 1� pið Þð Þdxi ð4Þ

where d is the detection limit. For clone libraries, thedetection limit in a sample is one clone and so d ¼ 1

NT.

448 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 7: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

NT is known and pi is the average relative abundanceacross all the sewage works. Therefore, Eq. (4) gives thetheoretical relationship between pi and the probability(or frequency) of detecting the ith taxon in any sampleas a function of one unknown taxa-independentparameter; m is the immigration probability. m can besimply calibrated by adjusting it to minimize thedifference between this theoretical probability of de-tection and the observed relative frequency with whichthe common taxa are observed (Fig. 2). For the sewageworks samples, the calibrated value of m is 0.1, and forthe estuary samples it is 0.77; this is the probability thatwhen an ammonia-oxidizing bacterium is lost from thesystem it is replaced from outside.

It would appear from the above analysis thatimmigration of AOB into German sewage works andinto samples from the Humber estuary is high. Thus,perhaps dispersal limitation is not a major driver inshaping community structure in these communities. So,let us ignore immigration and assume that environmentsare all the same and exactly the same structuring forcesact on the communities, and thus the distribution of taxaabundances is the same in each community. In this case,as noted above, the stochastic effects of random samplingwill mean that taxa are absent from some clone librariesand present in others purely by chance. For example, ifthe relative abundance of an organism is 0.5, then we arevery likely to see it in all clone libraries drawn from the

a) c)

100

101

102

103

10−20

10−15

10−10

10−5

100

Log10

(Taxon’s Rank)

Log 10

(Rel

ativ

e A

bund

ance

)

100

101

102

103

104

105

10−20

10−15

10−10

10−5

100

Log10

(Taxon’s Rank)

Log 10

(Rel

ativ

e A

bund

ance

)

b) d)

100

101

102

10−8

10−6

10−4

10−2

100

Log10

(Taxon’s Rank)

Log 10

(Rel

ativ

e A

bund

ance

)

m = 1

m =10−7

m =1.5x10−9

100

101

102

103

104

105

10−8

10−6

10−4

10−2

100

Log10

(Taxon’s Rank)

Log 10

(Rel

ativ

e A

bund

ance

)

m = 1.5x10−9

m = 10−7

m = 1

Figure 4. (a) Ranked relative abundance distribution for a logseries distribution with Hubbell’s [14] biodiversity number q = 2.0,which gives approximately 100 taxa in a source population of 10

20

individuals. q was calibrated by using the AMO gene clone libraries fromthe sewage works. (b) The ranked abundance distribution in a local neutrally assembled community of 10

9

individuals that has been fedwith immigrants from the source community in (a). (c) The ranked relative abundance distribution for a lognormal distribution in asource population of 10

20

individuals. The parameters of the lognormal were calibrated to be � ¼ ln 2ð Þffiffiffiffiffiffi2�2p ¼ 0:085 and diversity = 5 � 10

5

.(d) Ranked abundance distribution in a local neutrally assembled community of 10

9

individuals that has been fed with immigrantsfrom the source community in (c).

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 449

Page 8: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

communities, whereas an organism with relative abun-dance 0.001 will rarely appear in small clone libraries.This then begs the question, how much of an effect doesimmigration have on the community structure over andabove that which is attributable to random samplingfrom identical communities? If the clone libraries wereunbiased random samples from identical communitiesthen, pi, the mean relative abundance, is the probabilitythat an organism picked at random from the samplebelongs to the ith taxon. Thus if K is the number ofclones that belong to the ith taxon in a random sample ofsize NT clones, it will be distributed binomially,

Pr K ¼ kð Þ ¼ NT

k

� �pk

i 1� pið ÞNT�k ð5Þ

and, therefore, the probability of observing the taxonin the sample is

Pr k � 1ð Þ ¼ 1� 1� pið ÞNT ð6Þ

This has been plotted for the sewage works and estuarydata in Fig. 3. Clearly, immigration has an effect overand above random-sampling from identical communi-ties for the sewage work and estuary samples.

So far, it has been assumed that each random sampleof only a few clones can be treated as if it were anindependent little local community that has beenassembled neutrally. However, in the processes ofconstructing the clone library, any spatial structure thatexisted in the distribution of taxa will have beenobliterated. So, perhaps, it is more realistic to considerall the individual organisms in the original environmen-tal sample as constituting the local neutrally assembledcommunity and that the clone library represents arandom sample from that. The word Bperhaps[ is usedadvisedly here because we do not currently know onwhat scale it is best to represent and characterizemicrobial communities. In many other scientific disci-plines, characteristic length scales have been identified atwhich smaller scale variations in the properties of asystem begin to average out and effective variables can beemployed. For example, in modeling the geomechanicsor hydrogeology of geological formation, the rock will bebroken into representative elementary volumes withineach of which a single effective variable can describestrength or porosity, despite there being myriad differentcrystals contained within the volume. For microbialcommunities, the small-scale spatial and temporal vari-ability in community structure may be large and there-fore determining whether such characteristic lengthscales exist is important, because it is at such scales thatsimple models, such as the neutral model presented here,will be most effective in teasing out the ecologicalmechanisms that drive community assembly. Nonethe-

less, we assume here that all the microoganisms in theoriginal environmental sample comprise the local neu-trally assembled community and that a clone librarydrawn from that is a random, unbiased sample. Todistinguish between these two, let NS be the number ofclones in the library and retain NT to represent thenumber of individual in the community. We have beenable to derive expressions for the first and secondmoments of the probability density function (pdf ) forthe abundance of taxa in the clone library (Appendix),but could not derive a neat precise analytic expression forthis distribution. However, by repeatedly randomlysampling from synthetic large neutrally assembled com-munities, we found, unsurprisingly, that the distributionwas also approximately beta distributed. Under anassumption that the distribution is exactly betaxi � Beta xi : NS bmmpi ;NS bmm 1� pið Þð Þ then by matchingfirst and second moments (Appendix), bmm is related tothe true immigration probability of immigration intothe community, m, into the community by,

bmm ¼ NTm

NTmþ NS þ 1: ð7Þ

bmm could be considered as an effective immigration rateinto the small sample that encapsulates both the dis-persal limitation imposed on the community as a wholeand random sampling effects. Equation (7) allows us toextrapolate from our small random samples to theimmigration in the larger neutral community. In thecase of the sewage works, where the effective immigra-tion probability is 0.1, the immigration probability for aneutral community of 109 organisms would be 1.55 �10

_9. For the estuary, where the effective immigration is0.77, the immigration probability for a neutral commu-nity of 109 organisms is an order of magnitude higher,but still low at 7.0 � 10

_8. This would indicate thatimmigration for both environments is low if a represen-tative element of the microbial landscape comprises 109

organisms. On inspecting [Eq. (7)], it is apparent thatfor the effective immigration into a sample to varysignificantly from 1, then the product NTm must be ofthe order NS. This means that when the sample size, NS,is small and NT is large, the immigration probability hasto be very small indeed for the effects of dispersallimitation to be apparent in the sample. Conversely, forlarge microbial communities, it will be impossible todistinguish between high immigration rates and theimmigration probability being one. This does not meanthat immigration will have no affect on the taxa-abundance distribution of the community. It just meansthat the affects are difficult to see in small samplesunless they are pronounced.

Now we are in a position to use the main innovationpresented in this article, the mathematical description of

450 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 9: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

the taxa-abundance distribution for large communities,to investigate the overall community structure. Again,retaining the community of 109 organisms as a repre-sentative element in the community, we can demonstratethe sensitivity of the community structure to changes inimmigration. However, to do this requires an assump-tion on the nature of the source community based onvery little evidence. We have used two distributions todescribe the source community: first, a logseries distri-bution, which derives from Hubbell’s NCM for macro-organisms; second, a lognormal distribution, for whichsome theoretical and empirical justification has beenoutlined by Curtis et al. [5]. These were fitted, using leastsquares, to the average clone abundance data {pi} for thesewage works. The fact that two such different distribu-tions can be fitted equally well to the data highlights ouruncertainty in the underlying source community model.Figure 4 a and c show the source community abundancedistributions for a large source community of 1020

individuals. Figure 4b and d show what the local neutralcommunity taxa-abundance distributions would be if theimmigration probability was 1. That is, every localreplacement was an immigrant. As expected in this case,the distribution closely reflects the source communitydistribution. When the immigration probability is thecalibrated value of 1.55 � 10

_9, the expected abundanceof the common taxa increases and the diversity insamples (indicated by the taxon’s rank in Fig. 4) dropsfrom 50 to 10 taxa in the local community fed with im-migrants from a logseries distributed source communityand from 104 to 10 in the local community fed withimmigrants from the lognormally distributed source.There are more rare taxa associated with a lognormaldistributed source community and therefore the percent-age decrease in biodiversity is much greater. The subjectiveresult that low immigration will result in a local depletionof biodiversity will hold provided there are rare taxa in thesource community no matter how the taxa are distributed.Now, suppose that that the designer of the sewage worksanalyzed in this study wanted to engineer functionalredundancy in the ammonia-oxidizing bacterial commu-nity into their system. This might be possible by artificiallyincreasing the immigration rate of AOB into the localcommunity in the treatment reactor. Figure 4 also showsthat increasing the immigration probability by a factor of100 can effect a large change in biodiversity.

Discussion

This work demonstrates how mathematical modeling isan indispensable guide to the rational exploration of themicrobial world. The huge discrepancy between samplesize and the size of microbial communities leaves us nooption. This is amply demonstrated by the simple,

sampling exercise outlined at the start of the article,which clearly demonstrates the dangers of naivelyextrapolating from small samples. This is important,because a proper understanding of the nature of taxonabundance curves is central to the longstanding conun-drum of the extent of prokaryote diversity [6] and thecurves may be (rightly or wrongly) interpreted asreflecting underlying ecological processes [18].

The model we have deployed is simple and can becalibrated. We emphasize the importance of these twoattributes. A model that cannot be calibrated cannot beused to predict, and prediction is highly desirable intheoretical microbial ecology. This is because we do notknow many of the basic patterns in the communities weare dealing with. Thus we need to extrapolate from thedata and patterns we can observe, to make predictionsabout community structure. These can then be tested byusing appropriately targeted experimental programs. Thelow number of parameters deployed in our model arisesfrom its conceptual simplicity; it only considers the sizeof the community, births, deaths, and immigration. Itmight be argued that the model is too simple to offer anyguidance. However, the model does appear to beconsistent with patterns observed in microbial commu-nities [23], and the theory has been successfully appliedto higher organisms [14]. This does not preclude thepossibility of further refinements, or the necessity ofrigorous testing. However, it does suggest that itconstitutes a sound foundation for the rational explora-tion of the microbial world.

Our steady-state solution to the multitaxon neutralmodel affords the opportunity of predicting the wholecommunity taxon abundance distribution on the basis ofobservations made on the common taxa. In addition, itallows us to predict how changes in immigration willaffect local community structure and diversity. Theability to extrapolate from patterns in common taxa,which have been identified using sparse molecular data,to patterns in the rarer taxa is essential in addressing therole of diversity and community structure on microbialecosystem services provided by microorganisms.

Given the foregoing, it is unsurprising that samplingshould be considered when calibrating this, or any othermodel of community assembly. The relationship betweenthe immigration probability, bmm, calibrated in samplesand the actual immigration, m, into a local neutrallyassembled community [Eq. (7)] is important. First, itdemonstrates that what appears to be high immigrationprobabilities in samples can translate to very lowprobabilities of immigration into communities. Second,by using the very small samples that are typical of manymicrobial ecology surveys, the calibration method [22]can only quantify immigration for systems that arehighly dispersal limited, where the immigration prob-ability is very low. Conversely, much larger samples will

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 451

Page 10: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

be required to distinguish moderately dispersal limit-ed communities from those that are randomly assem-bled. Finally, it highlights our uncertainty about thescale at which we should be characterizing microbialcommunities.

Our ability to calibrate immigration in samplessuggests that an NCM at least partly explains communitystructure. However, to extrapolate to an immigrationprobability for the community by using Eq. (7) requires aknowledge of the size, NT, of the neutral community. Inthe sewage works example, is NT the population of thewhole sewage works, in which case immigration would bevery low indeed, or are some smaller units, such as flocs,assembling neutrally? Although we do not know the NT

values for the AOB in sewage works, we can be confidentthat there are of the order of 106 to 108 [3] in a milliliter.It follows, therefore, that the true m values will be verysmall even in small samples. This may have importantimplications for the debate on the biogeography ofbacteria [8]. It is, however, undoubtedly true that thiscontroversial field would benefit from the rigor thatappropriately parameterized mathematical models canbring to a debate.

By necessity, we have focused on characterization ofmicrobial communities based on a single genetic locus(amoA or 16S rRNA genes); however, the argumentsrelating to the extent and structuring forces of microbialdiversity are equally relevant in the context of environ-mental genomic studies. It is now well accepted thatrRNA genes are a conservative marker of microbialdiversity, and diversity at the level of the whole genomeis likely to be somewhat greater than suggested by 16SrRNA sequence-based analyses. This makes it all themore pressing that we develop rigorous mathematicalapproaches to provide a foundation upon which thegrowing resource of environmental genomic data can beinterpreted.

Appendix: Mathematical Appendix

Kolmogorov Backward Equation for the neutral community

model. The basis of the model is Hubbell’s NCM inwhich the community is saturated with a total of NTindividuals; and for an assemblage to change, anindividual must die or leave the system. This occurs ata taxa independent rate d. The dead individual isimmediately replaced by an immigrant from a sourcecommunity, with probability m, or by reproduction of amember of the local community with probability 1_ m.Thus, the community forms and develops through acontinuous cycle of immigration, reproduction, anddeath. Assuming that deaths are uniformly distributedin time, then during a period of time 1/d one death isexpected and the ith species, with initial absolute

abundance Ni, will either increase by 1, stay the same,or decrease by 1, with probability given by the followingthree expressions, respectively:

Pr Ni þ 1=Nið Þ

¼ NT � Ni

NT

� �mpi þ 1�mð Þ Ni

NT � 1

� �� �ð8Þ

Pr Ni=Nið Þ ¼ Ni

NT

mpi þ 1�mð Þ Ni � 1

NT � 1

� �� �

þ NT � Ni

NT

� �m 1� pið Þ þ 1�mð Þ NT � Ni � 1

NT � 1

� �� �

ð9Þ

Pr Ni � 1=Nið Þ

¼ Ni

NT

m 1� pið Þ þ 1�mð Þ NT � Ni

NT � 1

� �� �ð10Þ

where pi is the relative abundance of the ith species inthe source community. Hubbell used these transitionprobabilities for relatively small populations to form afinite Markov–Chain model with which the communitydynamics can be investigated and the stationary prob-ability distribution for Ni can be calculated. Thecomputational expense [19] of this discrete Markov-Chain formulation makes it impossible to apply to thevery large diverse populations that typify the microbialworld [27]. Here, we employ Kimura and Ohta’s [15]methods to recast the model for large populations.

Let, xi ¼ Ni

NTbe the relative abundance of the ith

species, and assume that NT, the local community size,is large enough that xi can be considered continuous.Also, let � xi ; x2;:::; xn; tð Þ be the joint pdf that therelative abundances of species 1,..., n at time t are x1,...,xn, respectively. The continuous model comes fromconsidering the expected change in � that will occur ina small time interval dt. To do this, we define g xi ;ð�x1;:::; xn; �xn; t; �tÞ to be the pdf for the relative abun-dance of species1 changing from x1 to x1 + � x1, and therelative abundance of species 2 changes from x2 to x2 +� x2,..., and the abundance of species n changes from xn toxn + � xn during the time period between t and t + � t.

Then,

� xi;:::; xn; t þ �tð Þ ¼R� x1 � �x1;:::; xn � �xn; tð Þg x1 � �x1; �x1;:::; xn � �xn; �xn; t; �tð Þd �x1ð Þ:::d �xnð Þ

452 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 11: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

Expanding this as an n-dimensional Taylor seriesabout the point x1,..., xn and neglecting terms of order 3and above gives

� xi;:::; xn; t þ �tð Þ ¼Z

�g �Pni¼1

�xi@@xi

�gð Þ

þPni¼1

�xið Þ22

@2

@x2i

�gð Þ

þ 12

Pni¼1

Pj6¼i

�xi�xj@2

@xi@xj�gð Þ

266666664

377777775

d �x1ð Þ::: d �xnð Þ

ð11Þ

where �g denotes � (x1, x2,...,xn, t)g(x1, � x1,...,xn, � xn; t,� t). Because

Rg d �xið Þ ¼ 1;

� x1; x2;:::; xn; t þ �tð Þ�� x1; x2;:::; xn; tð Þ

¼ �Pni¼1

@@xi

� pi; xi; tð ÞR�xið Þg d �xið Þ

� �

þ 12

Pni¼1

@2

@x2i

� pi; xi; tð ÞR�xið Þ2g d �xið Þ

� �

þ 12

Pni¼1

Pj 6¼i

� pi; xi; tð ÞR@�xið Þ �xj

� �g d �xið Þd �xj

� �� �

ð12Þtherefore,

@�

@t¼Xn

i¼1

� @ M�xi�ð Þ

@xiþ 1

2

@2 V�xi�

� �@x2

i

� �þ 1

2

Xn

i¼1

Xj6¼i

@2 C�xi�xj�

� �@xi@xj

ð13Þ

where M�xi and V�xi are the first and second momentsof the change in xi per unit of time and C�xi�xj

is theexpected product of changes in xi and xj. This is then-dimensional version of the Kolmogorov equation. Byconsidering the expected changes in relative abun-dance in the discrete time interval 1/d given by Eqs.(8)–(10), then M�xi

;V�xiand C�xi�xj

can be approximat-ed by

M�xi¼ m pi � xið Þ

NT

ð14Þ

V�xi¼ 2xi 1� xið Þ þm pi � xið Þ 1� 2xið Þ

N2T

ð15Þ

C�xj�xj¼ �

2xixj þm xi pj � xj

� �þ xj xi � pið Þ

� �N2

T

� �: ð16Þ

Reasoning that typically either m is small or pi rapidlyconverges on xi, we can neglect all but the first term of

both C�xi�xjand V�xi

. Equations (13)–(16) then definethe NCM for large populations by describing thechange in the joint probability of the relative abundan-ces of the n different taxa in the local community.

Stationary probability density function. Thesolution to the diffusion equation [Eq. (13)] with@�@t ¼ 0 and reflecting boundaries, where xi = 0 or xi =1, gives the stationary (long-term equilibrium) jointprobability density function (pdf) for the relativeabundance of the n taxa in the local community,xif gn

i¼1. Here, we show that the joint pdf for aDirichlet distribution,

�¼ G NTmð ÞG NTmp1ð Þ���G NTmpnð Þ

� �x

NTmp1�11 x

NTmp2�12 ���xNTmpn�1

n

ð17Þ

where xn ¼ 1� x1 � � � � � xn�1 and pn ¼ 1� p1 � � � � �pn�1 is a solution.

Note that if

� M�xi�ð Þ þ 1

2

@ V�xi�ð Þ

@xi

� �

þ 1

2

Xi 6¼j

@ C�xi�xj�

� �@xj

¼ 0 for i

¼ 1;���; n ð18Þ

then @�@t ¼ 0. Therefore, substituting in Eqs. (14)–(16),

we require

m pi � xið ÞNT

�� 1

2

@

@xi

2xi 1� xið ÞN2

T

� �

¼ 1

2

Xi 6¼j

@

@xj

�2xixjN2

T

� �ð19Þ

Substituting � into the left-hand side of Eq. (19) gives

m pi � xið ÞNT

�� @

@xi

xi 1� xið ÞN2

T

� �

¼ m pi�xið ÞNT

�� �N2

T

h iNTmpi � xi NTmpi þ 1ð Þ � xi 1�xið Þ NTmpn�1ð Þ

xn

h i¼ �

N2T

h iNTm pi � xið Þ � NTmpi � xi NTmpi þ 1ð Þ � xi 1�xið Þ NTmpn�1ð Þ

xn

h i¼ � �xi

N2T

h iNTm 1� pið Þ � 1� 1�xið Þ

xnNTmpn � 1ð Þ

h i

ð20Þ

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 453

Page 12: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

Similarly, substituting � into the right-hand side of(19) gives

Pi 6¼j

@@xj

�xixj

N2T

¼ � xi�

N2T

Xi6¼j

NTmpj �xj

xnNTmpn � 1ð Þ

� �

¼ � xi�

N2T

NTm 1� pi � prð Þ � 1� xi � xnð Þxn

NTmpn � 1ð Þ� �

¼ � xi�

N2T

NTm 1� pi � prð Þ þ NTmpr � 1ð Þ � 1� xið Þxn

NTmpn � 1ð Þ� �

¼ � xi�

N2T

NTm 1� pið Þ � 1� 1� xið Þxn

NTmpn � 1ð Þ� �

ð21Þ

Now, because (20) and (21) are equal, � is a solution tothe diffusion equation [Eq. (13)] with @�

@t ¼ 0 and thereflecting boundary conditions are met.

Algorithm for generating the stationary probability

density function. Given the relative abundances of ntaxa in the source community pif gn

i¼1, a realization ofthe Dirichlet distributed local abundances can begenerated by sampling from a set of gamma dis-tributions. Let Yif gn

i¼1 be random variables such thatYi ~ gamma(NTmpi) and let Yif gn

i¼1 be realizations ofthese variables sampled at random, then

xi ¼yiPn

j¼1

yj

i ¼ 1;:::; n ð22Þ

will represent a random sample from the Dirichlet jointprobability distribution for a local neutral community[Eq. (17)].

Sampling a neutral community. We have alreadyshown that for the continuous variant of the NCM, thesteady-state joint pdf for all species is DirichletDir(NTmpi,...,NTmpn), where p1,...,pn are the relativeabundances of the species in the metacommunity.

We can repeat the exact same argument to derive thejoint distribution of the relative abundances within asample of size NS from such a community. Strictlyspeaking, selecting a subsample of size NS from a localcommunity is achieved by simply sampling NS individ-uals without replacement from the community of sizeNT. However, since for almost all microbial samplesNS � NT, the problem can be approximated to one ofsampling with replacement.

Regard the sampling exercise as a continuous processthrough time. Individuals are selected from the sourcecommunity one by one until a sample of size NS has beencollected. Once this sample size has been reached, theprocess of selecting individuals continues at regular

intervals in time (generations) but now the selectedindividual replaces one randomly chosen individualcurrently in the sample population. This is analogous tothe argument used for deriving the joint distribution forthe local abundances, except that we have a pureimmigration–death process, with immigrants into thesample from the local community. Setting m = 1 andregarding our local abundances as the metacommunityfrom which immigrants are drawn, it is clear thatconditional on knowledge of local abundances x1,...,xn

the joint distribution of relative abundances y1,...,yn

within a sample is Dirichlet Dir(NSxi,...,NSxn). That is,

f Y Xjð Þ ¼ G NSð ÞYn

i¼1

yNSxi

i

G NSxið Þ ð23Þ

where X = (x1,...,xn) and X = (y1,...,yn) for notationalconvenience. This allows us to calculate the first andsecond moments of the sample distribution because weknow that the marginal densities of a Dirichlet distri-bution are beta distributed. Therefore,

E yi xijð Þ ¼ xi ð24Þand

E y2i xij

� �¼ xi NSxi þ 1ð Þ

NS þ 1ð25Þ

Now, since xi � Beta NTmpi ;NTm 1� pið Þ� �

, we have that

E yið Þ ¼ pi ð26Þand

E y2i

� �¼ 1

NS þ 1

� �NS

pi NTmpi þ 1ð ÞNTmþ 1

þ pi

� �

¼ NSNTmp2i þ NS þ NTmþ 1ð Þpi

NSNTmþ NTmþ NS þ 1

¼NSNTm

NTmþNSþ1

p2

i þ pi

NSNTmNTmþNSþ1

þ 1

ð27Þ

letting

~m ¼ NTm

NTmþ NS þ 1ð28Þ

then

E y2i

� �¼ NS

^mp2

i þ pi

NS^mþ 1

ð29Þ

We were unable to derive a neat analytical solution forthe marginal pdfs of abundance in the sample.However, repeated sampling from neutrally assembled

454 W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES

Page 13: Microbial Ecology - SOEST · 2007-12-03 · Microbial Ecology Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence Data William T. Sloan1, Stephen

synthetic communities confirmed that the marginalswere very closely approximated by beta distributions. Ifwe assume that the sample marginal distributions areexactly beta, then—as their first and second momentsare given by Eqs. (26) and (29), respectively—thesample distribution is given by,

yi ~Beta

�NS m̂pi;NSm̂ 1� pið Þ

�ð30Þ

References

1. Bell, G (2000) The distribution of abundance in neutral commu-nities. Am Nat 155: 606–617

2. Bell, T, Agar, D, Song, J, Newman, JA, Thompson, IP, Lilley, AK,van der Gast, CJ (2005) Larger islands house more bacterial taxa.Science 308: 1884

3. Coskuner, G, Ballinger, SJ, Davenport, RJ, Pickering, RL, Solera, R,Head, IM, Curtis, TP (2005) Agreement between theory andmeasurement in quantification of ammonia-oxidizing bacteria.Appl Environ Microbiol 71: 6325–6334

4. Cox, DR, Miller, HD (1965) The Theory of Stochastic Processes.Methuen, London

5. Curtis, T, Sloan, WT, Scannell, J (2002) Modelling prokaryoticdiversity and its limits. Proc Natl Acad Sci 99: 10494–10499

6. Curtis, TP, Sloan, WT (2005) Exploring microbial diversity—avast below. Science 309: 1331–1333

7. Enquist, BJ, Sanderson, J, Weiser, MD (2002) Modeling macro-scopic patterns in ecology. Science 295: 1835–1836

8. Fenchel, T, Finlay, BJ (2005) Bacteria and Island Biogeography.Science 309: 1997–1999

9. Finlay, BJ, Clarke, KJ (1999) Ubiquitous dispersal of microbialspecies. Nature 400: 828–828

10. Green, JL, Holmes, AJ, Westoby, M, Oliver, I, Briscoe, D,Dangerfield, M, et al. (2004) Spatial scaling of microbial eukaryotediversity. Nature 432: 747–750

11. Harris, LD (1984) The Fragmented Forest. University of ChicagoPress

12. Horner-Devine, MC, Lage, M, Hughes, JB, Bohannan, BJM (2004)A taxa-area relationship for bacteria. Nature 432: 750–753

13. Houchmandzadeh, B, Vallade, M (2003) Clustering in neutralecology. Phys Rev E 68: art. no. 061912

14. Hubbell, SP (2001) The Unified Neutral Theory of Biodiversityand Biogeography. Princeton University Press, Princeton

15. Kimura, M, Ohta, T (1971) Theoretical Aspects of PopulationGenetics. Princeton University Press, Princeton

16. Linacre, CH (2004) Diversity and the quantification of ammoniaoxidising bacteria and denitrification from turbidity maximum ofestuaries. PhD thesis, Civil Engineering and Geosciences, Univer-sity of Newcastle upon Tyne.

17. MacArthur, RH, Wilson, EO (Eds.) (1967) The Theory of IslandBiogeography. Princeton University Press, Princeton

18. May, RM (1975) Patterns of species abundance and diversity. In:Cody, ML, Diamond, JM (Eds.), Ecology and Evolution ofCommunities. Harvard University Press, Harvard, MA, pp 81–120

19. McGill, BJ (2003) A test of the unified neutral theory ofbiodiversity. Nature 422: 881–885

20. McKane, AJ, Alonso, D, Sole, RV (2004) Analytic solution ofHubbell’s model of local community dynamics. Theor Popul Biol65: 67–73

21. Purkhold, U, Pommerening-Roser, A, Juretschko, S, Schmid, MC,Koops, HP, Wagner, M (2000) Phylogeny of all recognized speciesof ammonia oxidizers based on comparative 16S rRNA and amoAsequence analysis: implications for molecular diversity surveys.Appl Environ Microbiol 66: 5368–5382

22. Sloan, WT, Woodcock, S, Lunn, M, Head, IM, Nee, S, Curtis, TP(2005) The roles of immigration and chance in shaping prokaryotecommunity structure. Environ Microbiol, Early Online 28 Nov

23. Sloan, WT, Lunn, M, Woodcock, S, Head, IM, Nee, S, Curtis, TP(2006) Quantifying the roles of immigration and chance inshaping prokaryote community structure. Environ Microbiol 8:732–740

24. Vallade, M, Houchmandzadeh, B (2003) Analytical solution of aneutral model of biodiversity. Phys Rev E 68: art. no. 061902

25. Volkov, I, Banavar, JR, Hubbell, SP, Maritan, A (2003) Neutraltheory and relative species abundance in ecology. Nature 424:1035–1037

26. Wagner, M, Loy, A (2002) Bacterial community composition andfunction in sewage treatment systems. Curr Opin Biotechnol 13:218–227

27. Whitman, WB, Coleman, DC, Wiebe, WJ (1998) Prokaryotes: theunseen majority. Proc Natl Acad Sci USA 95: 6578–6583

28. Woodcock, S, Lunn, M, Curtis, TP, Head, IM, Sloan, WT (2006)Taxa area relationships for microbes: the unsampled and theunseen. Ecol Lett 9: 805–812

29. Zwart, G, van Hannen, EJ, van Kamst, Agterveld, MP, van derGucht, K, Lindstrom, ES, van Wichelen, J, et al. (2003) Rapidscreening for freshwater bacterial groups by using reverse line blothybridization. Appl Environ Microbiol 69: 5875–5883

W.T. SLOAN ET AL.: NEUTRAL ASSEMBLY OF PROKARYOTE COMMUNITIES 455