A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen,...
-
Upload
gyles-arnold -
Category
Documents
-
view
217 -
download
0
Transcript of A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen,...
A Bayesian method for DNA barcoding
Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen,
University of Copenhagen
Varieties of barcoding
• Assignment to existing species.
• Identification of new species.
• Assignment to taxonomic levels in general
Motivation
1. Environmental aDNA samples.
2. Putative Neandertal DNA.
• Often short query sequences.– Little information.
• Permissive PCR conditions.– Not always from the intended locus.
Given a set of database reference sequences from different species
– according to which criteria should we assign new query sequences to taxonomic levels?
?
True species assignment
• Requires proper population genetic analyses quantifying variablity within species.
• Often not possible...– small database sample size for each species.– short query PCR products.
Phylogenetic alternative
- Purely phylogenetic criteria which ignore population genetic problems.
- Taxonomic annotation of database sequences is used to map phylogenetic groups to taxonomic levels.
- The simpler approach has its own advangates:
Less data required / Fewer assumptions
Monophyletictaxonomic group
Ingroup or outgroup?
Query
Estimating trees
• Estimation of a single tree is not sufficient because of the uncertainty regarding the phylogeny.
• We suggest instead to use a Bayesian approach which quantifies this uncertainty
Bayesian approach
• Let Q be the query sequence, X the database data, G a gene tree, and F a desired taxonomic group, then
where Gi is the ith gene tree sampled from p(G | X).
k
ii
G
GFQIk
dGXGpGFQIXFQ
1
)in icmonophylet ,(1
)|()in icmonophylet ,()|Pr(
Assignment pipeline
SummaryStatistics
QuerySequence
Homologyset
Taxonomysummary
Sampledtrees
Alignment
Database(GenBank)
NCBI blastRetrieval of sequences and taxonomy annotation
ClustalW
MrBayes
Summary statistics
• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the
sequences in the sister group (consensus taxonomy)
Sister group Query
Summary statistics
• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the
sequences in the sister group (consensus taxonomy)
• For each name of each taxonomic level:– Find the fraction of samples trees where the
consensus taxonomy include that name.
Example taxonomy summary
Environmental Samples
• 379 environmental samples (aDNA)
• RBCL and TRNL markers.
• Aim is the identification of environmental flora
Orders >90%
Asterales Brassicales Caryophyllales Coniferales
Dipsacales Ericales Fabales Fagales
Lamiales Lepidoptera Malpighiales Poales
Pottiales Ranunculales Rosales Sapindales
Saxifragales Solanales Zingiberales
Families >90%
Amaranthaceae Asteraceae Betulaceae Brassicaceae
Caprifoliaceae Caryophyllaceae Ericaceae Fabaceae
Fagaceae Juncaceae Musaceae Papaveraceae
Pinaceae Plantaginaceae Poaceae Rosaceae
Rutaceae Salicaceae Saxifragaceae Solanaceae
Taxaceae Theaceae
Genera >90%
Achillea Alnus Aruncus Cerastium
Fagus Musa Picea Pinus
Plantago Poa Saxifraga Symphoricarpos
Taxus
Botanical evaluation
Temperate climate
similar to central Sweden.
Testing putative Neandertal DNA
• Needless to say we have had several negative examples ...
• One positive example:– Posterior probability of 91%.
Problems
• No population genetic modelling:– Outgroup problem.– Species issues are is not addressed.– Lineage sorting - not reciprocal monophyli.
• Incomplete database
Advantages
• Phylogenetic uncertainty and statistical uncertainty of assignment is addressed.
• Posterior probability of assignment.
• Alternative to single tree assignment.
• Can be used on any database.
Conclusions
• The phylogenetic barcoding does not model the coalescence process.
• It is the appropriate method for assignment with little data, or when assigning to higher taxonomic levels.
• Bayesian approach offers a measure of confidence in assignment.