GEBA A genomic encyclopedia of bacteria and archaea
description
Transcript of GEBA A genomic encyclopedia of bacteria and archaea
• GEBA• A genomic
encyclopedia of bacteria and archaea
Eisen & Ward, PIs
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution: Really Fill in the Tree
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which no genomes are available
• Identify those with a cultured representative in DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 100+ (covering breadth of bacterial/archaea diversity)
• Annotate, analyze, release data• Assess benefits of tree guided
sequencing• 1st paper Wu et al in Nature Dec
2009
* The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel Genomes
* Phylogeny-driven genome selection helps discover new genetic diversity
* Phylogeny driven genome selection (and phylogenetics in general) improves genome annotation
* Improves analysis of genome data from uncultured organisms (not by too much)
GEBA Phylogenomic Lessons
Organism Selection Method I
MaxPD : Select organisms so the phylogentic diversity is maximized on a 16S rRNA tree
CLUSTER_56 number of sequences=3 genome representatives=0(10934,10867,237295)
Desulfobotulus sp. str. BG14Desulfocella halophila str. GSL-But2 DSMZ:DSM11763 TYPE STRAINDesulfocella sp. str. DSM 2056 DSMZ:DSM2056
CLUSTER_57 number of sequences=3 genome representatives=1(10775,10774,71864)
Desulfoarculus sp. str. BG74Desulfovibrio baarsii str. 2st14Desulfovibrio baarsii str. DSM 2075 DSMZ:DSM2075 Gi03014 TYPE STRAIN
MCL Clustering: divide organisms in a phylogenetic group in subgroups
Organism Selection Method II