Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
-
Upload
surya-saha -
Category
Education
-
view
28.549 -
download
3
description
Transcript of Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
![Page 1: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/1.jpg)
Computational Tools for Metagenomics
Surya Saha Twitter: @SahaSurya / LinkedIn: www.linkedin.com/in/suryasaha/
Magdalen Lindeberg Plant Pathology & Plant-Microbe Biology
Microbial Friends & Foes, Sep 25, 2012
![Page 2: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/2.jpg)
Temperton, Current Opinion in Microbiology, 2012
Impact of Technology on Metagenomics
![Page 3: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/3.jpg)
Types of “Meta” genomics
16S rRNA survey of bacterial microbiome
ITS survey of fungal microbiome
Bellemain, BMC Microbiology 2010 Slide: Julien Tremblay, JGI
![Page 4: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/4.jpg)
Types of “Meta” genomics
Whole genome shotgun • Varying complexity of microbial communities • High coverage sequencing • Sophisticated informatics • Host associated metagenomes
– Deep sequencing of host meta-genome – Bioinformatic screening of host sequences
• Environmental metagenomes – Eg. Soil samples – Requires very high depth of coverage – Complicated to assemble
![Page 5: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/5.jpg)
Big picture!!
![Page 6: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/6.jpg)
Big picture!!
What users see
![Page 7: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/7.jpg)
Big picture!!
What users see
What users want!!
![Page 8: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/8.jpg)
16S/ITS community surveys
• Multiple target regions in 16S gene and ITS region • Comparison of results requires amplification of same region • Advantages
– Fast survey of large communities – Mature set of tools and statistics for analysis – Good for first round survey
• 454 16S tags or pyrotags (~ 700 bp) have been the preferred method
• Illumina Miseq (2x150bp, 2x250 bp) are the next workhorses
• Depth of sampling – 2-6000 reads/sample for simple communities – 20000 reads /sample for complex soil metagenomes
![Page 9: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/9.jpg)
16S/ITS issues
• Lack of tools for processing ITS/Fungal microbiome data sets – RDP classifier targets only ITS – No ITS reconstruction tools
• Amplification bias effects accuracy and replication • Use of short reads prevents disambiguation of similar
strains • 16S or ITS may not differentiate between similar strains
– Clustering is done at 97% – Regions may be >99% similar
• Sequencing error inflates number of OTUs • Chloroplast 16S sequences can get amplified in plant
metagenomes
![Page 10: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/10.jpg)
16S/ITS sequence processing workflow Filter for contaminants and low quality reads
Assemble overlapping reads
Reduce datasets (clustering)
Perform taxonomic classification and compute diversity metrics
![Page 11: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/11.jpg)
16S/ITS sequence processing workflow Filter for contaminants and low quality reads
Assemble overlapping reads
Reduce datasets (clustering)
Perform taxonomic classification and compute diversity metrics
• Quality plots and read trimming
– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
– FASTX http://hannonlab.cshl.edu/fastx_toolkit/
• Chimera removal
– AmpliconNoise http://code.google.com/p/ampliconnoise/
– UCHIME http://www.drive5.com/uchime/
![Page 12: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/12.jpg)
Impact of Sequence Length
Slide: Feng Chen, JGI
![Page 13: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/13.jpg)
16S/ITS sequence processing workflow Filter for contaminants and low quality reads
Assemble overlapping reads
Reduce datasets (clustering)
Perform taxonomic classification and compute diversity metrics
• Merge overlapping paired end reads
– FLASH http://www.genomics.jhu.edu/software/FLASH/index.shtml
– FastqJoin http://code.google.com/p/ea-utils/wiki/FastqJoin
– CD-HIT read-linker http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit-auxtools-manual
![Page 14: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/14.jpg)
16S/ITS sequence processing workflow Filter for contaminants and low quality reads
Assemble overlapping reads
Reduce datasets (clustering)
Perform taxonomic classification and compute diversity metrics
• Clustering with high stringency
– UCLUST/USEARCH (16S only) http://www.drive5.com/usearch/
– CD-HIT-OTU (16S only) http://weizhong-lab.ucsd.edu/cd-hit-otu/
– phylOTU (16S only) https://github.com/sharpton/PhylOTU
![Page 15: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/15.jpg)
16S/ITS sequence processing workflow Filter for contaminants and low quality reads
Assemble overlapping reads
Reduce datasets (clustering)
Perform taxonomic classification and compute diversity metrics
• Composition based classifiers – RDP database + classifier http://rdp.cme.msu.edu/classifier/classifier.jsp
• Homology based classifiers – ARB + Silva database (16S only) http://www.arb-home.de/
– GreenGenes database (16S only) http://greengenes.lbl.gov/cgi-bin/nph-index.cgi
– UNITE database (ITS only) http://unite.ut.ee/
– FungalITSPipeline (ITS only) http://www.emerencia.org/fungalitspipeline.html
![Page 16: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/16.jpg)
• http://www.qiime.org/
• Comprehensive suite of tools – OTU picking
– Taxonomic classification
– Construction of phylogenetic trees
– Visualization
– Compute diversity statistics
• Available as Amazon EC2 image
![Page 17: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/17.jpg)
Whole Genome Shotgun (WGS) Metagenomics
• Better classification with Increasing number of complete genomes
• Focus on whole genome based phylogeny (whole genome phylotyping)
• Advantages – No amplification bias like in 16S/ITS
• Issues – Poor sampling of fungal diversity – Assembly of metagenomes is complicated due to
uneven coverage – Requires high depth of coverage
![Page 18: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/18.jpg)
WGS sequence processing workflow
Filter for low quality reads
Assemble reads
Perform taxonomic classification and compute diversity metrics
![Page 19: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/19.jpg)
WGS sequence processing workflow
Filter for low quality reads
Assemble reads
Perform taxonomic classification and compute diversity metrics
• Quality plots and read trimming
– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
– FASTX http://hannonlab.cshl.edu/fastx_toolkit/
![Page 20: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/20.jpg)
WGS sequence processing workflow
Filter for low quality reads
Assemble reads
Perform taxonomic classification and compute diversity metrics
• NGS assembly with uneven depth
– IDBA-UD http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/
– MIRA http://www.chevreux.org/projects_mira.html
– Velvet / MetaVelvet http://www.ebi.ac.uk/~zerbino/velvet/
http://metavelvet.dna.bio.keio.ac.jp/
![Page 21: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/21.jpg)
WGS sequence processing workflow
Filter for low quality reads
Assemble reads
Perform taxonomic classification and compute diversity metrics
• Hybrid composition/homology based classifiers – FCP http://kiwi.cs.dal.ca/Software/FCP
– Phymm/PhymmBL http://www.cbcb.umd.edu/software/phymm/
– AMPHORA2 http://wolbachia.biology.virginia.edu/WuLab/Software.html
– NBC http://nbc.ece.drexel.edu/
– MEGAN http://ab.inf.uni-tuebingen.de/software/megan/
![Page 22: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/22.jpg)
WGS sequence processing workflow
Filter for low quality reads
Assemble reads
Perform taxonomic classification and compute diversity metrics
• Web based classifiers
– MG-RAST http://metagenomics.anl.gov/
– CAMERA http://camera.calit2.net/
– IMG/M http://img.jgi.doe.gov/cgi-bin/m/main.cgi
![Page 23: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/23.jpg)
MetaPhAln
• Unique clade-specific markers for sequenced bacteria and archaea • 400 genuses/4000 genomes including HMP genomes • Species level resolution • MetaPhAln 2 in the works
– Eukaryotes including Fungi – Viruses – Higher coverage of archaea
• Krona and GraphAln for visualization of output • Websites
– https://bitbucket.org/nsegata/metaphlan – http://huttenhower.sph.harvard.edu/metaphlan
![Page 24: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/24.jpg)
PhyloSift/pplacer
• Reference database of marker genes • Places reads on tree of life based on homology to
reference protein • Integration with metAMOS for pre-assembling next-
generation datasets • Bacterial and Archaeal classification only • Plant and Fungi marker genes are being added • Websites
– http://phylosift.wordpress.com/ – https://github.com/gjospin/PhyloSift
![Page 25: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/25.jpg)
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
![Page 26: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/26.jpg)
Acknowledgements
Funding
Magdalen Lindeberg Cornell University
Dave Schneider USDA-ARS, Ithaca
Citrus greening / Wolbachia (wACP)
![Page 27: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences](https://reader033.fdocuments.in/reader033/viewer/2022052522/554e94dab4c905fc368b4eb5/html5/thumbnails/27.jpg)
Thank you!
Surya Saha [email protected]
Suggestions
• Plan informatics workflow as early as possible
• Incorporate statistics at different stages in the workflow