What's going on in the environment? Getting a grip on microbial physiology with genomics and...

64
What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards http://phage.sdsu.edu/~rob Fellowship for Interpretation of Genomes, San Diego State University, Burnham Institute for Medical Research, IMEC, LLC SIO, San Diego, May 200

Transcript of What's going on in the environment? Getting a grip on microbial physiology with genomics and...

Page 1: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

What's going on in the environment? Getting a grip on microbial

physiology with genomics and metagenomics

Rob Edwardshttp://phage.sdsu.edu/~rob

Fellowship for Interpretation of Genomes,San Diego State University,

Burnham Institute for Medical Research,IMEC, LLC

SIO, San Diego, May 2006

Page 2: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 3: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

The Players

• FIG: Fellowship for Interpretation of Genomes

• NMPDR: Natl. Microbial Pathogen Data Resource

• BRC: NIH Bioinformatics Resource Centers

• SEED: The SEED database.

Page 4: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How Many Genomes Have Been Sequenced?

Complete Draft Total

Archaea

Bacteria

Eukarya

Page 5: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How Many Genomes Have Been Sequenced?

Complete Draft Total

Archaea 26 12 38

Bacteria

Eukarya

Page 6: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How Many Genomes Have Been Sequenced?

Complete Draft Total

Archaea 26 12 38

Bacteria 342 238 580

Eukarya

Page 7: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How Many Genomes Have Been Sequenced?

Complete Draft Total

Archaea 26 12 38

Bacteria 342 238 580

Eukarya 29 533 562

Page 8: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

When will the 1,000thmicrobial genome be sequenced?

1,000

2,000

3,000

4,000

5,000

1996

2000 2004 2008X X X X X X X X X X

Com

ple

te G

enom

es

Year

Page 9: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 10: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

http://theseed.uchicago.edu/FIG/index.cgi

The SEED database developed by FIG

Current version:

580 Bacteria (342 complete)38 Archaea (26 complete)562 Eukarya (29 complete)1335 Viruses2 Environmental Genomes

Page 11: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

The problem:

How do you generate consistent

annotations for 1,000 genomes?

Page 12: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Basic biology

lacZlacI lacY lacA

Page 13: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Different types of clustering

< 80 % < 80 % < 80%

Page 14: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Act

inob

acte

ria

Aquifi

cae

Bacte

roid

etes

Chlam

ydia

e

Chlor

oflex

i

Cyano

bact

eria

Deino

cocc

us-

Ther

mus Fi

rmicut

es

Spiro

chae

tes

Ther

mot

ogae

Prot

eoba

cter

ia

1

0.8

0.6

0.4

0.2

0

Clusters of genes w/ maximum 80% identityGenes in subsystems in clustersTotal number of genomes in group

Fra

ctio

n o

f genes

in c

lust

ers

Num

ber o

f genom

es

0

40

80

120

Avera

ge

Occurrence of clustering in different genomes

Page 15: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 16: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

The Subsystems Approach to Annotation

• Subsystem is a generalization of “pathway”– collection of functional roles jointly involved

in a biological process or complex

• Functional Role is the abstract biological function of a gene product– atomic, or user-defined, examples:

• 6-phosphofructokinase (EC 2.7.1.11)• LSU ribosomal protein L31p• Streptococcal virulence factors • Does not contain “putative”, “thermostable”, etc

• Populated subsystem is complete spreadsheet of functions and roles

Page 17: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystems developed based on

• Wet lab• Chromosomal context• Metabolic context• Phylogenetic context• Microarray data• Proteomics data

• …

Page 18: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Example Subsystem: Histidine Degradation

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

• Conversion of histidine to glutamate • Functional roles defined in table• Inclusion in subsystem is only by functional role• Controlled vocabulary …

Page 19: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystem Spreadsheet

• Column headers taken from table of functional roles• Rows are selected genomes or organisms• Cells are populated with specific, annotated genes• Functional variants defined by the annotated roles• Variant code -1 indicates subsystem is not functional• Clustering shown by color

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

Page 20: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

“The Populated Subsystem”

1 HutH Histidine ammonia-lyase (EC 4.3.1.3)

2 HutU Urocanate hydratase (EC 4.2.1.49)

3 HutI Imidazolonepropionase (EC 3.5.2.7)4 GluF Glutamate formiminotransferase (EC 2.1.2.5)

5 HutG Formiminoglutamase (EC 3.5.3.8)

6 NfoD N-formylglutamate deformylase (EC 3.5.1.68)

7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

Page 21: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystem Diagram

• Three functional variants• Universal subset has three roles, followed by

three alternative paths from IV to VI• No ForI known experimentally

www.nmpdr.org

ForI

H2O

V NfoD

NH3

I III HutI IV HutG VI

H2O H2O H2O Formamide

HutH II HutU

NH3

GluF

Tetrahydrofolate FormiminotetrahydrofolateSubsystem Diagram

Page 22: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystem Spreadsheet

• Prediction from subsystems confirmed experimentally

Organism Variant HutH HutU HutI GluF HutG NfoD ForI

Bacteroides thetaiotaomicron 1 Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1 gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2 Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2 Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2 P10944 P25503 P42084 P42068

Caulobacter crescentus 3 P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3 Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3 Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes -1

Subsystem Spreadsheet

Page 23: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 24: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How do bacteria make methionine?

acquirehomoserine

convertcysteine to cystathione

convertcystathione tohomocysteine

acquire met orconverthomocysteine tomethionine

sulfur and acetylhomoserinesulfhydralase

Page 25: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Sulfhydrylation

Organism Variant

Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120 0 4427 657 619 1093

Synechocystis sp. PCC 6803 0 2356 1112 2469 1144Thermosynechococcus elongatus BP-1

0 277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

0415, 4266

6167106, 1229

2279 4433

Gloeobacter violaceus PCC 7421 0 4295 1127 2500 477 789

Anabaena variabilis ATCC 29413 33 2331 5519 3872 38734254, 6365

6434

Nostoc punctiforme 33 2895 6648 5301 5302 4055 1885Prochlorococcus marinus MED4 66 1204 1764 1714 1715 2 1 1421 295Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102 66 706 1476 845 846 669 670 1233 2258Synechococcus elongatus PCC 7942

0 1397 769 2172 1030 2173 702 639

Homocerine activation Transsulfuration Methylation

Page 26: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Sulfhydrylation

Organism Variant

Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120 0 4427 657 619 1093

Synechocystis sp. PCC 6803 0 2356 1112 2469 1144Thermosynechococcus elongatus BP-1

0 277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

0415, 4266

6167106, 1229

2279 4433

Gloeobacter violaceus PCC 7421 0 4295 1127 2500 477 789

Anabaena variabilis ATCC 29413 33 2331 5519 3872 38734254, 6365

6434

Nostoc punctiforme 33 2895 6648 5301 5302 4055 1885Prochlorococcus marinus MED4 66 1204 1764 1714 1715 2 1 1421 295Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102 66 706 1476 845 846 669 670 1233 2258Synechococcus elongatus PCC 7942

0 1397 769 2172 1030 2173 702 639

Homocerine activation Transsulfuration Methylation

?

?

Missing genes

Page 27: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Cyanoseed:http://cyanoseed.theFIG.info

Page 28: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Marineseed:http://theseed.uchicago.edu/FIG/organisms.cgi?

show=marine

Page 29: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

predicted or measured co-regulation

genome context(virulence islands, prophages,

conserved gene clusters)

virulence mechanism

cellular localization

enzymatic activity

common phenotype

combinations of criteria

Subsystems are not just for gene clusters

Page 30: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

How much progress has been made?

• 541 subsystems encoded

• 80 – 85% of the genes in core machinery are contained in subsystems

• 30 – 35% of genes in NMPDR organism genomes,

• 20 – 30% of other genomes contained in subsystems

Page 31: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 32: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Metagenomics

200 liters water 5-500 g fresh fecal matter

DNA/RNA LASL

Sequence

Epifluorescent Microscopy

Concentrate and purify viruses

Extract nucleic acids

Breitbart et al., multiple papers

Page 33: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Control datasets for metagenome comparisons

Bacteria 952,758

Archaea 49,694

Eukarya 259,653

Acid mine 7,588

Sargasso(without Shewanella, Burkholderia)

960,561

Sorcerer II ~13,000,000

Number of proteins in different datasets

Page 34: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystems per million CDS

Page 35: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Determination of Statistical Differences

Between Metagenomes• Take 10,000 proteins from sample 1• Count frequency of each subsystem• Repeat 20,000 times

• Repeat for sample 2

• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI

• Compare medians from samples 1 and 2 with 95% CI

Rodriguez-Brito (2006). BMC Bioinformatics

Page 36: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Sampling Sargasso and “SEED” metagenomes

Page 37: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Comparison of all SubsystemsMore in Sargasso More in SEED

Page 38: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Is serine being used as an osmolyte?

•Few trehalose, proline, sucrose synthetic genes

•Serine is most abundant amino acid in ocean (Suttle, Keil)

•Serine is more effective osmoprotectant than glycine betaine(Yancey)

Page 39: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 40: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Metagenomics

200 liters water 5-500 g fresh fecal matter

DNA/RNA LASL

Sequence

Epifluorescent Microscopy

Concentrate and purify viruses

Extract nucleic acids

Breitbart et al., multiple papers

454

So 2004

Page 41: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

454 Sequence Data(Only from Rohwer Lab, in one year)

• 42 libraries– 22 microbial, 20 phage

• 1,028,563,420 bp total– 33% of the human genome– 95% of all complete and partial bacterial genomes– 10% of community sequencing of JGI per year

• 9,933,184 sequences– Average 236,511 per library

• Average read length 103.5 bp– Av. read length has not increased in 12 months

Page 42: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

The Soudan Mine, Minnesota

Red Stuff OxidizedBlack Stuff Reduced

Page 43: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Red and Black Samples Are Different

Cloned and 454 sequenced16S are indistinguishable

Black stuff

Red

ClonedRed

Page 44: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

There are different amounts of metabolism in each environment

Page 45: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

There are different amounts ofsubstrates in each environment

BlackStuff

RedStuff

Page 46: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

But are the differences significant?

• Sample 10,000 proteins from site 1• Count frequency of each “subsystem”• Repeat 20,000 times

• Repeat for sample 2

• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI

• Compare medians from sites 1 and 2 with 95% CI

Rodriguez-Brito (2006). BMC Bioinformatics

Page 47: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Subsystem differences & metabolism

Iron acquisitionBlack Stuff

Siderophore enterobactin biosynthesisferric enterobactin transportABC transporter ferrichromeABC transporter heme

Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])

Red stuff: ferric iron (goethite [FeO(OH)])

Page 48: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Nitrification differentiates the samples

Edwards (2006)BMC Genomics

Page 49: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

The challenge is explaining the differences between samples

Red Sample

Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal metabolism

Black Sample

Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate

degradation

Page 50: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

We can cheaply compare the importantbiochemistry happening in different

environments

We don’t care which organisms are doing the metabolism but we know what organisms are

there

Page 51: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Outline

• Sequencing statistics scare skeptics

• The SEED database

• Some simply stunning Subsystems

• Mysterious missing methionine metabolism

• Marine metabolism mined from metagenomics

• Fabulous four-five-four for facile functional

findings

• Marine phage most puzzling

Page 52: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Phages In The Worlds Oceans

GOM41 samples

13 sites5 years

SAR1 sample

1 site1 year

BBC85 samples

38 sites8 years

ARC56 samples

16 sites1 year

LI4 sites1 year

Page 53: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Phages, Reefs, and Human Disturbance

The Northern Line IslandsExpedition, 2005

Christmas

Kingman

Christmas

Kingman

Palmyra

Washington

Fanning

Page 54: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

16S rDNA at each island

Page 55: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

16S rDNA of the Proteobacteria

Page 56: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Phages at each island

Page 57: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Christmas to Kingman Bias in No. Phage HostsNegative numbers mean relatively more phage hosts at Kingman

Page 58: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Phages In The Worlds Oceans

GOM41 samples

13 sites5 years

SAR1 sample

1 site1 year

BBC85 samples

38 sites8 years

ARC56 samples

16 sites1 year

LI4 sites1 year

Page 59: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Most Marine Phage Sequences are Novel

Page 60: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Thanks: Mya Breitbart

Phages are specific to environments

PhageProteomicTree v. 5(Edwards, Rohwer)

ssDNA

-like

T7-likeT4-like

Page 61: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Marine Single-Stranded DNA Viruses

• 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae)

• 40% viral particles in SAR are ssDNA phage

• Several full-genome sequences were recovered via de novo assembly of these fragments

• Confirmed by PCR and sequencing

Page 62: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

12,297 sequence fragments hit using TBLASTXover a ~4.5 kb genome

3890 bp 4490 bp

0

1033

SAR Aligned Against the Chlamydia 4

Individual sequence reads

Chlamydia phi 4genome

Coverage

Concatenated hits

Page 63: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

Summary

You only need to remember:

• Subsystems are the best way to annotate genomes

• 454 generates lots of data

• We can use subsystems to find out what is going on in the environment

Page 64: What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards rob Fellowship.

SDSU Forest Rohwer Beltran Brito-Rodriguez Linda Wegley

USF Mya Breitbart

University of Bielefeld Folker Meyer Lutz Krause

FIG Veronika Vonstein Ross Overbeek Gordon Pusch

ANL Rick Stevens Bob Olsen Terry Disz

Annotators Gary Olsen Andrei Ostermann Olga Zagnitko Olga Vassieva Svetlana Gerdes Ramy Aziz

UBC Curtis Suttle Amy Chan