The use of short-read next generation sequences to recover the evolutionary histories in...
-
Upload
erick-black -
Category
Documents
-
view
215 -
download
0
Transcript of The use of short-read next generation sequences to recover the evolutionary histories in...
![Page 1: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/1.jpg)
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples
Systematic biology presentationYuantong Ding Dec. 6
![Page 2: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/2.jpg)
Outline • Background
• Workflow
• Sequence comparison
• Tree comparison
• Summary & future work
![Page 3: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/3.jpg)
Can short-reads successfully recover phylogeny?
• Next generation sequencing (NGS)• Low-cost• High-throughput • Short-read
Multi individual sampleShort-reads Reconstructed sequence phylogeny
?
Background Workflow Sequence comparison Tree comparisonSummary
![Page 4: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/4.jpg)
Simulation process Original genealogy Original haplotypes NJ treeSimulated by
SerialSimCoal with coalescent model
Consensus sequence Short-readsSimulated by MetaSim with 454 error model
Mapping Alignment built by SHRiMP and SSAHA
Reconstructed haplotypes Haplotypes reconstructed by ShoRAH
NJ tree built by PAUP* Compare tree topology
Compare number and similarity ofhaplotypes
Background Workflow Sequence comparison Tree comparisonSummary
![Page 5: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/5.jpg)
6 parameters used• Effective population size N• Sample size n• Mutation rate μ• Sequence length l
N n μ l Sr_N Sr_l
3000 10 5.00E-05 1200 5000 200
5000 20 1.00E-05 2000 10000 400
10000 40 5.00E-06 5000 30000 —
• Number of short-reads Sr_N• Length of short-reads Sr_l
Background Workflow Sequence comparison Tree comparisonSummary
All 486 combination of these parameters were simulated
![Page 6: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/6.jpg)
Different numbers of haplotypes
Background Workflow Sequence comparison Tree comparisonSummary
![Page 7: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/7.jpg)
Similar sequences
Background Workflow Sequence comparison Tree comparisonSummary
![Page 8: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/8.jpg)
Can reconstructed haplotypes still capture some phylogenetic information?
• Different haplotypes number impossible to recover the true phylogenetic trees
Assuming true haplotypes number of the sample is known
Select the most similar reconstructed sequences to build phylogeny tree
Calculate symmetric difference
Background Workflow Sequence comparison Tree comparisonSummary
Cluster (k-mean) reconstructed haplotypes to n groups
Build tree with consensus sequence of each group
Calculate tree balance statistics
![Page 9: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/9.jpg)
Method for tree comparison
A B C B A C(BC)(ABC)
(AC)(ABC) symmetric difference = 2
Symmetric difference for rooted and labeled trees
Tree balance statistics for rooted and unlabeled trees
ANi is the internal nodes number between tip i and root
e.g. i=A, NA = 2, Ñ = (2+2+2+3+3)/5=2.4
![Page 10: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/10.jpg)
Different topology of most similar sequence tree
Background Workflow Sequence comparison Tree comparisonSummary
![Page 11: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/11.jpg)
Different balance statistics of k-mean cluster tree
Background Workflow Sequence comparison Tree comparisonSummary
n N_bar I_c
org rec P org rec P
10 4.8 4.7 0.002 0.74 0.67 0.0004
20 7.5 6.9 9.2e-09 0.57 0.47 1.52e-10
40 10.6 9.6 1.2e-08 0.40 0.33 1.94e-09
![Page 12: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/12.jpg)
Summary & future work
• Reconstructed haplotypes typically failed to estimate the correct number of haplotypes
• Consequently, it was not possible to recover the true phylogenetic trees.
• Even assuming we know the true haplotype number, the chance to recover the true tree topology is still small.
• Other reconstruction method, use multiple reference sequence when mapping…
![Page 13: The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ece5503460f94bdb925/html5/thumbnails/13.jpg)
Reference • Anderson, C.N.K., Ramakrishnan, U. et al.2005. Serial SimCoal: A population
genetic model for data from multiple populations and points in time. . Bioinformatics 21, 1733-1734.
• Johnson, P.L., Slatkin, M., 2006. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res 16, 1320-1327.
• Richter, D.C., Ott, F. et al. 2008. MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3, 3373.
• Suzuki, S., Ono, N., Furusawa, C., Ying, B.-W., Yomo, T., 2011. Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms. PLoS ONE 6, e19534.
• Zagordi, O., Bhattacharya, A. et al. 2011. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119
• Metei D., Misko D,. et al. 2011 SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics 27, 7
• Ning Z, Cox AJ and Mullikin JC. 2001. SSAHA: a fast search method for large DNA databases. Genome research, 1725-9