Talk by Jonathan Eisen on "Phylogenomics" at Gordon Conference in 2001
Jonathan Eisen talk at ASM General Meeting 2010
-
Upload
jonathan-eisen -
Category
Technology
-
view
2.433 -
download
2
description
Transcript of Jonathan Eisen talk at ASM General Meeting 2010
A phylogeny driven genomic encyclopedia of bacteria and archaea
Jonathan A. Eisen
Talk at ASMGMMay 25, 2010
Tuesday, May 25, 2010
Fleischmann et al. 1995
Tuesday, May 25, 2010
Microbial genomes
From http://genomesonline.orgTuesday, May 25, 2010
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
Based on Hugenholtz, 2002
2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
Based on Hugenholtz, 2002
2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in ArchaeaBased on Hugenholtz, 2002
2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in EukaryotesBased on Hugenholtz, 2002
2002
Tuesday, May 25, 2010
The Tree is not Happy
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Why Increase Phylogenetic Coverage?
• Common approach within some eukaryotic groups
• Many small projects to fill in bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
• Many potential benefits
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs
Tuesday, May 25, 2010
Tuesday, May 25, 2010
The Tree of Life is Still Angry
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Eukaryotes
Bacteria
Archaea
Tuesday, May 25, 2010
Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae
2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 100 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
• Solution - use tree to really fill gaps
Well sampled phyla
Tuesday, May 25, 2010
http://www.jgi.doe.gov/programs/GEBA/pilot.htmlTuesday, May 25, 2010
A Genomic Encyclopedia of Bacteria and Archaea (GEBA)
Tuesday, May 25, 2010
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which no genomes are available
• Identify branches with a cultured representative in DSMZ
• Grow > 200 of these and prep. DNA• Sequence and finish 100 (covering breadth
of bacterial/archaea diversity)• Annotate, analyze, release data• Assess benefits of tree guided sequencing
Tuesday, May 25, 2010
GEBA and Openness
• All data released as quickly as possible w/ no restrictions to IMG-GEBA; Genbank, etc
• Data also available in Biotorrents (http://biotorrents.net)
• Individual genome reports published in OA “Standards in Genome Sciences (SIGS)”
• 1st GEBA paper in Nature freely available and published using Creative Commons License
Tuesday, May 25, 2010
GEBA Lesson 1
rRNA Tree is Useful for Identifying Phylogenetically Novel Genomes
Tuesday, May 25, 2010
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Network of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Whole Genome Tree w/ AMPHORA
http://bobcat.genomecenter.ucdavis.edu/AMPHORA/See Wu and Eisen, Genome Biology 2008 9: R151
Tuesday, May 25, 2010
Compare PD in Trees
Tuesday, May 25, 2010
PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060Tuesday, May 25, 2010
GEBA Lesson 1B
rRNA Tree topology is not perfect;Genome-based trees better
Tuesday, May 25, 2010
16s Says Hyphomonas is in Rhodobacteriales
Badger et al. 2005
28Tuesday, May 25, 2010
WGT and individual gene trees:Its Related to Caulobacterales
Badger et al. 2005
29Tuesday, May 25, 2010
Wh
Concatenated alignment “whole genome tree” built using AMPHORA
Tuesday, May 25, 2010
Whole genome phylogeny?• Many approaches
– Gene presence/absence– Concatenation of phylogenetic markers– Separate phylogeny of genes and then
integration of results (e.g., networks)– Models that incorporate gain/loss as well as
gene phylogeny• No new results from us
– However ... see Eric Alm talk Ballroom A - “Microbes in a changing world” session tomorrow AM
Tuesday, May 25, 2010
GEBA Lesson 2
Phylogeny-driven genome selection helps discover new genetic diversity
Tuesday, May 25, 2010
Network of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Protein Family Rarefaction Curves
• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
Tuesday, May 25, 2010
Tuesday, May 25, 2010
Tuesday, May 25, 2010
Tuesday, May 25, 2010
Tuesday, May 25, 2010
Tuesday, May 25, 2010
Synapomorphies exist
Tuesday, May 25, 2010
GEBA Lesson 3
Phylogeny-driven genome selection improves genome annotation
Tuesday, May 25, 2010
Predicting Function
• Key step in genome projects• More accurate predictions help guide
experimental and computational analyses• Many diverse approaches• Comparative and evolutionary analysis
greatly improves most predictions
Tuesday, May 25, 2010
Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling
• Better definition of protein family sequence “patterns” (e.g., improved HMMs)
• Conversion of hypothetical into conserved hypotheticals
• Greatly improves “comparative” and “evolutionary” based predictions
• Linking distantly related members of protein families
• Improved non-homology prediction
Tuesday, May 25, 2010
From Wu et al. 2009.Tuesday, May 25, 2010
GEBA Lesson 4
Phylogeny-driven genome selection improves analysis of genome data
from uncultured organisms
Tuesday, May 25, 2010
Metagenomics Challenge
Tuesday, May 25, 2010
Metagenomics Challenge
1. Who is out there? 2. What are they doing?
Tuesday, May 25, 2010
Who is out there?
• Mimic rRNA PCR based studies• But can now do these with other genes
Tuesday, May 25, 2010
rRNA phylotyping from metagenomics
Venter et al., 2004
Tuesday, May 25, 2010
Shotgun Sequencing Allows Use of Alternative Anchors (e.g., RecA)
Venter et al., 2004
Tuesday, May 25, 2010
0
0.1250
0.2500
0.3750
0.5000
Alphaproteobacteria
Betaproteobacteria
Gammaproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Thermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ght
ed %
of
Clo
nes
Major Phylogenetic Group
EFGEFTuHSP70RecARpoBrRNA
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
Tuesday, May 25, 2010
0
0.1250
0.2500
0.3750
0.5000
Alphaproteobacteria
Betaproteobacteria
Gammaproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Thermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ght
ed %
of
Clo
nes
Major Phylogenetic Group
EFGEFTuHSP70RecARpoBrRNA
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
Should improve with better genomic sampling
Tuesday, May 25, 2010
Functional Inference from Metagenomics
• Can work well for individual genes• Predicting “community” function is
challenging because treating community as a bag of genes does not work well
• Better to “compartmentalize” data ...
Tuesday, May 25, 2010
ABCDEFG
TUVWXYZ
Binning challenge
Tuesday, May 25, 2010
ABCDEFG
TUVWXYZ
Binning challenge
Best binning method: reference genomes
Tuesday, May 25, 2010
Reference Genomes Coming from Select Environment
Tuesday, May 25, 2010
ABCDEFG
TUVWXYZ
Binning challenge
No reference genome? What do you do?
Tuesday, May 25, 2010
ABCDEFG
TUVWXYZ
Binning challenge
No reference genome? What do you do?
Phylogeny ....Tuesday, May 25, 2010
AMPHORA
Guide treeTuesday, May 25, 2010
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacter
ia
Gammap
roteob
acteria
Deltapr
oteo
bacter
ia
Epsil
onpr
oteo
bacter
ia
Uncla
ssified
Pro
teob
acteria
Cyan
obac
teria
Chlamyd
iae
Acidob
acteria
Bacter
oide
tes
Actin
obac
teria
Aquific
ae
Plan
ctom
ycetes
Spiro
chae
tes
Firmicu
tes
Chloro
flexi
Chloro
bi
Uncla
ssified
Bac
teria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own treeTuesday, May 25, 2010
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacter
ia
Gammap
roteob
acteria
Deltapr
oteo
bacter
ia
Epsil
onpr
oteo
bacter
ia
Uncla
ssified
Pro
teob
acteria
Cyan
obac
teria
Chlamyd
iae
Acidob
acteria
Bacter
oide
tes
Actin
obac
teria
Aquific
ae
Plan
ctom
ycetes
Spiro
chae
tes
Firmicu
tes
Chloro
flexi
Chloro
bi
Uncla
ssified
Bac
teria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Should improve with better genomic sampling
Tuesday, May 25, 2010
Metagenomic Analysis Improves w/ Phylogenetic Sampling
• Small but real improvements in– Gene identification / confirmation– Functional prediction– Binning– Phylogenetic classification
Tuesday, May 25, 2010
Metagenomic Analysis Improves w/ Phylogenetic Sampling
• Small but real improvements in– Gene identification / confirmation– Functional prediction– Binning– Phylogenetic classification
• But not a lot ...
Tuesday, May 25, 2010
How to improve phylogenetic analysis of metagenomic data
• Fragmented data
• Which genes to use?
• More automation
Tuesday, May 25, 2010
iSEEM Project
Tuesday, May 25, 2010
Phylogenetic challenge
A single tree with everything
Tuesday, May 25, 2010
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacter
ia
Gammap
roteob
acteria
Deltapr
oteo
bacter
ia
Epsil
onpr
oteo
bacter
ia
Uncla
ssified
Pro
teob
acteria
Cyan
obac
teria
Chlamyd
iae
Acidob
acteria
Bacter
oide
tes
Actin
obac
teria
Aquific
ae
Plan
ctom
ycetes
Spiro
chae
tes
Firmicu
tes
Chloro
flexi
Chloro
bi
Uncla
ssified
Bac
teria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with better phylogenetic methods
Tuesday, May 25, 2010
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacter
ia
Gammap
roteob
acteria
Deltapr
oteo
bacter
ia
Epsil
onpr
oteo
bacter
ia
Uncla
ssified
Pro
teob
acteria
Cyan
obac
teria
Chlamyd
iae
Acidob
acteria
Bacter
oide
tes
Actin
obac
teria
Aquific
ae
Plan
ctom
ycetes
Spiro
chae
tes
Firmicu
tes
Chloro
flexi
Chloro
bi
Uncla
ssified
Bac
teria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with more gene families
Tuesday, May 25, 2010
New “Marker Genes”
• 100 representative genomes• MCL gene families• Identify gene families w/
– High universality– High uniformity of copy number– Phylogenetic tree similar to “whole genome
tree”
Tuesday, May 25, 2010
0 1 2 3 4 5 6
rRNA16SruvBnusArplBpurArpsJsecYrpsIpyrHrpsErplPrplNrpsCruvArplFrplAserSrplKrpsKpriAsmpBrpsGguaArpsQrpsLrplUrplOrpsMinfCrplSrplVrplCrpsPrplErplTrplLrplQrpsHmraWrpsOrpsBrplIrplMrplRttffrrtsfrplDradArpsStrmDcoaErpmA
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
nusArpsCrpsEpriArplBsecY
rRNA16SrpsJrpsBruvBguaArplNserSrplFfrrrplArplErplCinfCrplDrplKpurAradAruvArpsMpyrHrplIrplMrpsGrpsLmraWrpsIttfrplStrmDtsfrplUrpsKrpsPrplOrplTrplVrpsSrplPrpsOsmpBrpsHrplQrplRrpsQrplLrpmAcoaE
Ribosomal protein Transcription/translation related proteinDNA repair protein Protein of other functionAMPHORA marker
Distance between the genome tree and 100 random trees (average ± standard deviation)
NODAL distance SPLIT distance
Distances between gene trees and the AMPHORA concatenated genome tree
Tuesday, May 25, 2010
Screen gene markers for any given taxonomic groupPhylogenetic group Genome
NumberGene Number
Maker Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Tuesday, May 25, 2010
Phylogenetic Binning Using AMPHORA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Alph
apro
teob
acteria
Betapr
oteo
bacter
ia
Gammap
roteob
acteria
Deltapr
oteo
bacter
ia
Epsil
onpr
oteo
bacter
ia
Uncla
ssified
Pro
teob
acteria
Cyan
obac
teria
Chlamyd
iae
Acidob
acteria
Bacter
oide
tes
Actin
obac
teria
Aquific
ae
Plan
ctom
ycetes
Spiro
chae
tes
Firmicu
tes
Chloro
flexi
Chloro
bi
Uncla
ssified
Bac
teria
dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf
AMPHORA - each read on its own tree
Improves with better automation
Tuesday, May 25, 2010
Zorro
• http://sourceforge.net/projects/probmask/• ZORRO is a probabilistic masking program
that assigns confidence scores to each column in a multiple seqeunce alignment. These scores can then be used to account for alignment accuracy in phylogenetic inference pipelines
• Wu, Chatterji, Eisen submitted
Tuesday, May 25, 2010
Tuesday, May 25, 2010
GEBA Phylogenomic Lesson 5
We have still only scratched the surface of microbial diversity
Tuesday, May 25, 2010
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Archaea
Eukaryotes
Bacteria
Tuesday, May 25, 2010
Phylogenetic Diversity: Sequenced Bacteria & Archaea
From Wu et al. 2009Tuesday, May 25, 2010
Phylogenetic Diversity with GEBA
From Wu et al. 2009Tuesday, May 25, 2010
Phylogenetic Diversity: Isolates
From Wu et al. 2009Tuesday, May 25, 2010
Phylogenetic Diversity: All
From Wu et al. 2009
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
Well sampled phylaPoorly sampled
No cultured taxaTuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria• Genome sequences are mostly
from three phyla• Most phyla with cultured
species are sparsely sampled• Lineages with no cultured
taxa even more poorly sampled
Well sampled phyla
Poorly sampled
No cultured taxaTuesday, May 25, 2010
Uncultured Lineages:Technical Approaches
• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification
Tuesday, May 25, 2010
GEBA Phylogenomic Lesson 6
Need Experiments from Across the Tree of Life too
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Experimental studies are mostly from three phyla
As of 2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Experimental studies are mostly from three phyla
• Some studies in other phyla
As of 2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Eukaryotes
As of 2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Viruses
As of 2002
Based on Hugenholtz, 2002
Tuesday, May 25, 2010
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Need experimental studies from across the tree too
Tuesday, May 25, 2010
Tuesday, May 25, 2010
MICROBES
Tuesday, May 25, 2010
A Happy Tree of Life
Tuesday, May 25, 2010