Eisen.Geba.Jgi2009b

76
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. GEBA A genomic encyclopedia of bacteria and archaea Jonathan A. Eisen JGI User Meeting 2009

description

Talk I gave at the JGI User Meeting 2009.

Transcript of Eisen.Geba.Jgi2009b

Page 1: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBAA genomic encyclopedia of

bacteria and archaea

Jonathan A. Eisen

JGI User Meeting 2009

Page 2: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

“Nothing in biology makes senseexcept in the light of evolution.”

T. Dobzhansky (1973)

Page 3: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 4: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

rRNA Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 5: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

The Tree is not Happy

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture. From http://genomesonline.org

Page 7: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Based on Hugenholtz, 2002

Page 8: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Based on Hugenholtz, 2002

Page 9: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Based on Hugenholtz, 2002

Page 10: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

As of 2002

Based on Hugenholtz, 2002

Page 11: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Need for Tree Guidance Well Established

• Common approach within some eukaryotic groups

• Many small projects funded to fill in some bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature

Page 12: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen, Ward, Badger, Wu, Wu, et al.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 13: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Bacterial aTOL Project AIMS

• Improve resolution of deep branches in the bacterial tree

• Launch biological studies of these phyla and discover functional novelty

• Leverage data for interpreting environmental surveys

Page 14: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

T. roseum genome

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 15: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

The Tree of Life is Still Angry

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 16: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Within Phyla Diversity Immense

• Each phyla represents billions of years of evolution

• Some have hundreds of major lineages

• New lineages are being discovered all the time

• Most branches within most phyla have few or no genomes

Page 17: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Page 18: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Additional Impetus for Tree Guided Projects

• Suggestion to sequence all bacteria and archaea in Bergey’s Manual (Stevens et al)

• Success in sequencing genomes from across the tree in animals

• Multiple government reports suggest a more systematic approach to sequencing is needed

Page 19: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 100 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

• Solution - use tree to really fill gaps

Well sampled phyla

Page 20: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

http://www.jgi.doe.gov/programs/GEBA/pilot.html

Page 21: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot Project Overview

• Select 200 organisms using tree

• Develop high throughput pipeline for strain growth and DNA preparation

• Sequence and finish 100

• Annotate, analyze, release data

• Assess benefits of tree guided sequencing

Page 22: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot I: Selecting Targets

Page 23: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 24: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 25: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 26: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: Actinobacteria (High GC)

B: Aminanaerobia

B: Aquificae

B: BacteroidetesB: Chloroflexi

B: DeferribacteresB: Deferribacteres

B: Deinococci

B: Delta ProteobacteriaB: Epsilon Proteobacteria

B: FirmicutesB: Fusobacteria

B: Gamma ProteobacteriaB: Gemmatimonadetes

B: HaloanaerobialesB: PlanctomycetesB: Spirochaetes

B: Thermodesulfobacteria

B: ThermodesulfobiaB: Thermovenabulae

A: Halobacteria A: Archaeoglobi A: MethanobacteriaA: Methanomicrobia

A: ThermococciA: Thermoprotei

Phyla

# of Genomes

Page 27: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 28: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot II: The Importance of Project

Management

Page 29: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Project Flowchart

GEBA Proposal

Scientific and Technical Review1

Negotiate Scope of Work

Receive Starting Material1

OK?

Project Initiation Sequencing

Annotation

Draft Sequencing

and Assembly1

Finish Sequencing

and Assembly2

IMG1

Finish Annotation3

Complete Genome GenBank

Submission1

Draft Annotation3

Shotgun Genome GenBank

Submission1

IMG – ER1

1 PGF2 LANL3 ORNL

OK?

OK?

IMG – ER1

Gene-QA1

David Bruce, Lynne Goodwin et al

Page 30: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot III: Partnership with DSMZ

Page 31: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Biggest Challenge:Getting DNA

• Getting quality DNA is biggest bottleneck• Solution: Beg Borrow and Steal

• DSMZ offered to do for free• ATCC is doing a small number for a fee• In discussions with other PCC and other

collections

Page 32: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 33: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

MicroorganismsMicroorganismsQuantification gel of the genomic DNA isolated from

Conexibacter woesei (DSM 14684T)

Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).

1 2 3 4 5 6 7 8

Lane 1: c(-Marker)= 15 ngLane 2: c(-Marker)= 30 ngLane 3: c(-Marker)= 50 ngLane 4: DNA Molecular Weight Marker II (Roche

236250)Lane 5: DSM 13279, Collinsella stercorisLane 6: DSM 43043, Intrasporangium calvumLane 7: DSM 18053, Dyadobacter fermentansLane 8: DSM 20476, Slackia heliotrinireducens

Lane 9: DSM 18081, Patulibacter minatonensisLane 10: DSM 14684, Conexibacter woeseiLane 11: DSM 11002, Dethiosulfovibrio peptidovoransLane 12: DSM 11551, Halogeometricum borinquenseLane 13: DNA Molecular Weight Marker II (Roche

236250)Lane 14: c(-Marker)= 125 ngLane 15: c(-Marker)= 250 ng Lane 16: c(-Marker)= 500 ng

9 10 11 12 13 14 15 16

Page 34: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot IV: Sequencing, Annotation, Data

Release

Page 35: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Current Status

• >100 in progress

• GEBA 56 (focus of first paper)– 34 finished genomes– 55 submitted to Genbank– Released to IMG-GEBA page and JGI-FTP site

• All data is completely Open for anyone to use

Page 36: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

IMG/GEBA

QuickTime™ and a decompressor

are needed to see this picture.

http://img.jgi.doe.gov/cgi-bin/geba/main.cgi

Page 37: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Adopt a Microbe

QuickTime™ and a decompressor

are needed to see this picture.

Page 38: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA Pilot IV: Assess Benefits of GEBA56

All genomes have some value

But what, if any, is the benefit of tree-guided sequencing over other

selection methods

Page 39: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Why Increase Taxonomic Coverage II?

• Gene discovery

• Annotation, functional prediction

• Metagenomic analysis

• Mechanisms of diversification

• Species phylogeny and classification

Page 40: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 41: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Value of diverse genomes I: Gene discovery

• Premise:– New genomes frequently contain genetic

novelty– Phylogenetic diversity of a genome should be

correlated to novelty

• Caveat: – Does lateral gene transfer wipe out contribution

of phylogenetic diversity to novelty?

Page 42: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Protein Family Rarefaction Curves

• Take data set of multiple complete genomes

• Identify all protein families using MCL

• Plot # of genomes vs. # of protein families

Page 43: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 44: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

0

50000

100000

150000

200000

250000

300000

350000

0 10 20 30 40 50 60 70 80

S. agalactiae

Enterobacteriaceae

Actinobacteria

Bacteria from GEBA project

Genome Number

Tot

al G

ene

Num

ber

Num

ber

of p

rote

ins

Page 45: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Novelty 2 - Structural Novelty

• Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu)

• Structural modeling suggests many are structurally novel too (D'haeseleer)

• 372 being crystallized by the PSI (Kerfeld)

Page 46: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Novelty 3

Diversity within known families

Page 47: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Transporter Profiles

0

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

actmi

beuca

brafa catac

celfl

conwo

dyafe

halmu

halut

krifl

nakmu

pedhe

sacvi sphth

spili

stana

strro

sulde theac

thete

tsupa xylce

denac

detpe haloc

halbo kanko

plali

acife

meiru

meisi

rhoma

aliac chipi

desr5

desba geoob

thebi thecu

anapr

atopa

bramu

desa7

jonde

sanke sebte

slahe

capoc

crycu eggle gorbr kytse

lepbu nocda strmo

veipa

Number of transporters

i n o r g a n i c i o n s a m i n o a c i d s , n i t r o c o m p o u n d s a n d p e p t i d e s d r u g s / t o x i n s s u g a r s c a r b o x y l a t e s n u c l e o s i d e s / t i d e s , b a s e s s i d e r o p h o r e s o t h e r

Sebaldella termitidis ATCC 33386 has 2x number of sugar PTS transporters of any genome

Page 48: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Novelty 4

Unusual distribution patterns

Page 49: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Shotgun Sequencing Detects More Diversity than PCR-methods

Page 50: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

First Bacterial Actin Related Protein

First found by V. Kunin, Structure Analysis by Patrik D. et al

Page 51: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Most Closely Related to ARP8

QuickTime™ and a decompressor

are needed to see this picture.

Page 52: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Value of 100 diverse genomes II: Annotation

• Premise:– Increased phylogenetic coverage should

improve our ability to annotate genes in other (e.g., reference/model genomes)

Page 53: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Annotation Improves

• Conversion of hypothetical into conserved hypotheticals

• Linking distantly related members of protein families

• Non-homology functional prediction methods

Page 54: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Linking Protein Families ImprovedGenes -links

0 20 40 60 80 100 120 140

Haliangium ochraceum SMP-2, DSM 14365Spirosoma linguale DSM 74

Catenulispora acidiphila ID139908, DSM 44928Streptosporangium roseum NI 9100, DSM

Sebaldella termitidis ATCC 33386Planctomyces limnophilus DSM 3776

Dyadobacter fermentans NS 114, DSM 18053Chitinophaga pinensis UQM 2034, DSM 2588

Actinosynnema mirum 101, DSM 43827Stackebrandtia nassauensis LLR-40K-21, DSM

Kribbella flavida DSM 17836Desulfotomaculum acetoxidans 5575, DSM 771

Halogeometricum borinquense DSM 11551Meiothermus silvanus DSM 9946

Nakamurella multipartita Y-104, DSM 44233Nocardiopsis dassonvillei dassonvillei DSM

Conexibacter woesei ID131577, DSM 14684Gordonia bronchialis DSM 43247

Leptotrichia buccalis C-1013-b, DSM 1135Halorhabdus utahensis AX-2, DSM 12940

Brachyspira murdochii 56-150, DSM 12563Meiothermus ruber DSM 1279

Denitrovibrio acetiphilus N2460, DSM 12809Slackia heliotrinireducens DSM 20476

Pedobacter heparinus HIM 762-3, DSM 2366Alicyclobacillus acidocaldarius acidocaldarius

Capnocytophaga ochracea DSM 7271Desulfomicrobium baculatum DSM 4028

Jonesia denitrificans DSM 20603Saccharomonospora viridis P101, DSM 43017

Halomicrobium mukohataei arg-2, DSM 12286Geodermatophilus obscurus G-20, DSM 43160

Thermobaculum terrenum YNP1, ATCC BAA-798Sphaerobacter thermophilus 4ac11, DSM 20745Beutenbergia cavernosae HKI 0122, DSM 12333

Thermomonospora curvata DSM 43183Cellulomonas flavigena 134, DSM 20109

Dethiosulfovibrio peptidovorans SEBR 4207,Eggerthella lenta VPI 0255, DSM 2243Xylanimonas cellulosilytica DSM 15894

Rhodothermus marinus DSM 4252Veillonella parvula Te3, DSM 2008

Tsukamurella paurometabola DSM 20162Kytococcus sedentarius DSM 20547

Kangiella koreensis SW-125, DSM 16069Sanguibacter keddieii DSM 10542

Thermobispora bispora DSM 43833Streptobacillus moniliformis DSM 12112Sulfurospirillum deleyianum DSM 6946

Brachybacterium faecium DSM 4810Anaerococcus prevotii PC 1, DSM 20548

Desulfohalobium retbaense DSM 5692Acidimicrobium ferrooxidans DSM 10331

Cryptobacterium curtum DSM 15641Atopobium parvulum IPP 1246, DSM 20469Thermanaerovibrio acidaminovorans Su883,

Genome

Links

Genes -links

Page 55: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Fusion Based Predictions Improvedgene harboring new fusions (COG)

0 5 10 15 20 25 30 35

Thermanaerovibrio acidaminovorans Su883, DSM 6589 Cryptobacterium curtum DSM 15641

Sphaerobacter thermophilus 4ac11, DSM 20745 Kytococcus sedentarius DSM 20547

Dethiosulfovibrio peptidovorans SEBR 4207, DSM 11002 Haliangium ochraceum SMP-2, DSM 14365

Atopobium parvulum IPP 1246, DSM 20469 Denitrovibrio acetiphilus N2460, DSM 12809

Brachybacterium faecium DSM 4810 Meiothermus silvanus DSM 9946

Slackia heliotrinireducens DSM 20476 Xylanimonas cellulosilytica DSM 15894

Stackebrandtia nassauensis LLR-40K-21, DSM 44728 Nakamurella multipartita Y-104, DSM 44233

Desulfohalobium retbaense DSM 5692 Tsukamurella paurometabola DSM 20162

Sanguibacter keddieii DSM 10542 Streptosporangium roseum NI 9100, DSM 43021

Actinosynnema mirum 101, DSM 43827 Rhodothermus marinus DSM 4252

Cellulomonas flavigena 134, DSM 20109 Brachyspira murdochii 56-150, DSM 12563 Leptotrichia buccalis C-1013-b, DSM 1135

Catenulispora acidiphila ID139908, DSM 44928 Conexibacter woesei ID131577, DSM 14684

Meiothermus ruber DSM 1279 Nocardiopsis dassonvillei dassonvillei DSM 43111

Thermobispora bispora DSM 43833 Beutenbergia cavernosae HKI 0122, DSM 12333

Acidimicrobium ferrooxidans DSM 10331 Desulfotomaculum acetoxidans 5575, DSM 771

Kribbella flavida DSM 17836 Eggerthella lenta VPI 0255, DSM 2243

Gordonia bronchialis DSM 43247 Thermobaculum terrenum YNP1, ATCC BAA-798

Desulfomicrobium baculatum DSM 4028 Thermomonospora curvata DSM 43183

Geodermatophilus obscurus G-20, DSM 43160 Planctomyces limnophilus DSM 3776

Jonesia denitrificans DSM 20603 Halogeometricum borinquense DSM 11551

Chitinophaga pinensis UQM 2034, DSM 2588 Dyadobacter fermentans NS 114, DSM 18053

Alicyclobacillus acidocaldarius acidocaldarius 104-IA,Halomicrobium mukohataei arg-2, DSM 12286

Kangiella koreensis SW-125, DSM 16069 Anaerococcus prevotii PC 1, DSM 20548

Saccharomonospora viridis P101, DSM 43017 Halorhabdus utahensis AX-2, DSM 12940

Pedobacter heparinus HIM 762-3, DSM 2366 Sebaldella termitidis ATCC 33386

Capnocytophaga ochracea DSM 7271 Sulfurospirillum deleyianum DSM 6946

Spirosoma linguale DSM 74 Streptobacillus moniliformis DSM 12112

Page 56: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Improving Rosetta Stone Predictions

Page 57: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Value of 100 diverse genomes III: Metagenomics

• Premise: – Increased sampling of diverse genomes should

improve many aspects of metagenomic analysis

• To test:– Annotation– Binning

Page 58: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Metagenomic Annotation Improves (Slightly)

QuickTime™ and a decompressor

are needed to see this picture.

Page 59: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Compositional Binning Improves (Slightly)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 60: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Phylogenetic Binning Improves Slightly

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

AlphaproteobacteriaBetaproteobacteriaGammaproteobacteria

DeltaproteobacteriaEpsilonproteobacteria

Unclassified Proteobacteria

CyanobacteriaChlamydiae

AcidobacteriaBacteroidetesActinobacteria

Aquificae

PlanctomycetesSpirochaetes

FirmicutesChloroflexiChlorobi

Unclassified Bacteria

dnaGfrrinfCnusApgkpyrGrplArplBrplCrplDrplErplFrplKrplLrplMrplNrplPrplSrplTrpmArpoBrpsBrpsCrpsErpsIrpsJrpsKrpsMrpsSsmpBtsf

Page 61: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Value of 100 diverse genomes V: Phylogeny

Page 62: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

16s Says Hyphomonas is in Rhodobacteriales

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Badger et al. 2005

Page 63: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

WGT Says Its Related to Caulobacterales

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Badger et al. 2005

Page 64: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 65: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 66: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

GEBA - After the Pilot

Page 67: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PD of sequenced organisms

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 68: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PD with GEBA

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 69: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 70: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Most phyla with cultured species are sparsely sampled

• Lineages with no cultured taxa even more poorly sampled

Well sampled phyla

Poorly sampled

No cultured taxa

Page 71: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Based on Hugenholtz, 2002

Page 72: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Microbial Eukaryotes

As of 2002

Based on Hugenholtz, 2002

Page 73: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Page 74: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 75: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

MICROBES

Page 76: Eisen.Geba.Jgi2009b

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

A Happy Tree of Life

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.