Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell...
Transcript of Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell...
The problem
Binning: clustering sequences with the same origin together
A corner piece? GREAT! But where is the rest of the puzzle?
Drew Sheneman, New Jersey -- The Newark Star Ledger
Potato processing wastewater treatment plant at Olburgen, The Netherlands
Stable system operated since 2006
Images:Left & Middle Abma et al. Water Science & Technology (2010)
Study site
nitritation/ anammox reactor (600 m3)
5.0 m
0.2 m
1.4 m
2.6 m
3.8 m
total sample
washed granules
1
2
3
4
5
6
7
8
total sample
washed granules
DNA isolation
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Organic extraction
Powersoil kit
Sampling strategy: 8 samples
Sample treatmentSample location DNA isolation
Data handles
Sequence composition
Prior knowledge (Databases)
Sequence abundance
Mate pair & Paired end
Data handles: mate pair and paired end
Data handles: mate pair and paired end
Data handles: databases
Data handles: composition
Limited chemical signature
Biological information- Codon usage (tetramer frequency)
‘Unique’ long k-mers
Contig/read length matters!
DNA isolation and
library preparation
sequencing and assembly
Data handles: abundance
Abundance in the sample correlates with abundance in reads
Many roads try to get to Rome
Reference based and reference independent binning methods
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).
Many roads try to get to Rome
Composition: - GC content- Tetranucleotide frequencies
Abundance - Long k-mer copy number- Contig coverage
Content- Essential single copy genes
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).
Binning approaches
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Binning approaches
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Assembly independent binning
Wang, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28, i356–i362 (2012).
T = long kmer abundance
w = long kmer length
Binning approaches
Assembly independent read binning
Binning on GC content and Sequencing depth
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Separating genomes: binning
Binning based on coverage and GC content
Se
quen
cin
g de
pth
GC content
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Binning: tetranucleotide ESOM
Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).
Emergent Self Organizing Map (ESOM) based on tetranucleotide frequency
Binning: tetranucleotide ESOM
Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Using nucleotide extraction bias to separate organisms
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31, 533–538 (2013).
Binning: differential coverage binning
http://madsalbertsen.github.io/multi-metagenome/
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
differential coverage binning: crAss
differential coverage binning: groopM
http://minillinim.github.io/GroopM/
1. Imelfort, M., Parks, D., Woodcroft, B. J. & Dennis, P. GroopM: An automated tool for the recovery of population genomes from related metagenomes. (2014).
differential coverage binning: concoct
1. Alneberg, J. et al. CONCOCT: Clustering cONtigs on COverage and ComposiTion. (2013).
differential coverage binning: ESOM
1. Kantor, R. S. et al. Small Genomes and Sparse Metabolisms of Sediment-Associated Bacteria from Four Candidate Phyla. MBio 4, e00708–13–e00708–13 (2013).
differential coverage binning: ESOM
1. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol 32, 822–828 (2014).
Binning approaches
(This is not an exhaustive list…)
Assembly independent read binning
Binning on GC content and coverage
Tetranucleotide ESOM
Differential coverage based binning- Nuceotide extraction bias- Different samples
Hi-C Metagenomics
Determining what belongs together by crosslinking total cell content
1) Crosslink2) Cut DNA3) Religate randomly4) Sequence paired end labrary of both crosslinked and native sample
Binning: Hi-C metagenomics
Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1
Clustering by organism (and even replicon!)
Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1
Binning: Hi-C metagenomics
Roads less travelled…Whichever method you choose, do a background check…
When analyzing a complex community,
experimental design largely determines how much you can get out
Binning: concluding remarks