Metagenomics is:
Why Do We Need Metagenomics?Snapshot of bacterial communityCannot be cultivated
Motivation
<1%
Monitoring the impact of pollutants on ecosystems
Discovery of new genes, enzymes…- Global Ocean Sampling Expedition
Human Microbiome Project
JGI sequenced Acid Mine Drainage sample
Applications
Marker Gene Sequencing16s rRNA:
Two ways
Other marker genes: RuBisCo, NifHOnly composition
Whole Genome Sequencing (WGS)Detailed picture of community
Two Paradigms
Why not assemble reads?
ORFome assembler*Three steps:
The putative ORFs are annotated for each read ORFs are assembled using EULER ORF homologs are searched for in Integrated Microbial
Genomics (IMG) database
Existing WGS assemblersSanger reads: Phrap, Celera, Arachne, JAZZ…Short reads: Velvet, Newbler…
Current Status
* Y. Ye and H. Tang, "An orfome assembly approach to metagenomics sequences analysis." Journal of bioinformatics and computational biology, vol. 7, no. 3, pp. 455-471, June 2009
Genovo: De Novo Assembly for Metagenomes
Jonathan Laserson, Vladimir Jojic and Daphne Koller. RECOMB 2010, LNBI 6044, pp. 341-356, 2010
Main IdeaPropose a generative model for Metagenome
dataUsing iterated conditional modes (ICM)Using hill-climbing steps iterativelyDesign a score for evaluation
ModelInitialize contigs:
Infinite contigs with infinite length
Partition the readsUsing Chinese Restaurant Process
AlgorithmUsing ICMStarting from initial condition, hill-climbing
moves are performed iterativelyMove 1: Consensus Sequence:
Select the most frequent base
AlgorithmMove 2: Read Mapping
For read i, first remove it, then recalculate its contig and alignment
First, for each potential location, compute alignment
Then, select the location according to possibility
Filtering: using common 10-mer
AlgorithmMove 3: update geometric variable
->Globle moves:
Propose indelsCenterMerge contigs
Chimeric readsDisassemble the dangling contigs
EvaluationBLASTPFAMDesigned score
1st term: quality of assembly2nd term: penalty for total length3rd term: prefer to merge when V>V0
DiscussionNew ideaApply a mature algorithm to assembly
domainSystematically describe and analyze the
problem and algorithmResults are better
DiscussionSlowly: minute vs. hours for 300k 454 readsMain idea: try to extend as long as possible,
so they will have more hits for BLASTWhy choose 20 for V0?How to deal with branching? Repeats?Model:
Why it can capture the property of metagenomic data?
How to argue the correctness of that model?The distribution of starting points
Top Related