Download - Rob Edwards phage.sdsu/~rob San Diego State University

Transcript
Page 1: Rob Edwards phage.sdsu/~rob San Diego State University

Challenges for metagenomic data analysis and lessons from viral metagenomes

[What would you do if sequencing were free?]

Rob Edwards

http://phage.sdsu.edu/~rob

San Diego State UniversityFellowship for Interpretation of Genomes

SGM Meeting, Warwick, April 2006

Page 2: Rob Edwards phage.sdsu/~rob San Diego State University

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Page 3: Rob Edwards phage.sdsu/~rob San Diego State University

This is all 454 sequence data

• 21 libraries– 10 microbial, 11 phage

• 597,340,328 bp total– 20% of the human genome– 50% of all complete and partial microbial

genomes

• 5,769,035 sequences– Average 274,716 per library

• Average read length 103.5 bp– Av. read length has not increased in 7 months

• Cost 0.04¢ per bp

Page 4: Rob Edwards phage.sdsu/~rob San Diego State University

Sequencing is cheap and easy.

Bioinformatics is neither.

Page 5: Rob Edwards phage.sdsu/~rob San Diego State University

The Soudan Mine, Minnesota

Red Stuff OxidizedBlack Stuff Reduced

Page 6: Rob Edwards phage.sdsu/~rob San Diego State University

Red and Black Samples Are Different

Cloned and 454 sequenced16S are indistinguishable

Black stuff

Red

ClonedRed

Page 7: Rob Edwards phage.sdsu/~rob San Diego State University

There are different amounts of metabolism in each environment

Page 8: Rob Edwards phage.sdsu/~rob San Diego State University

There are different amounts ofsubstrates in each environment

BlackStuff

RedStuff

Page 9: Rob Edwards phage.sdsu/~rob San Diego State University

But are the differences significant?

• Sample 10,000 proteins from site 1• Count frequency of each “subsystem”• Repeat 20,000 times

• Repeat for sample 2

• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI

• Compare medians from sites 1 and 2 with 95% CI

Rodriguez-Brito (2006). BMC Bioinformatics

Page 10: Rob Edwards phage.sdsu/~rob San Diego State University

Subsystem differences & metabolism

Iron acquisitionBlack Stuff

Siderophore enterobactin biosynthesisferric enterobactin transportABC transporter ferrichromeABC transporter heme

Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])

Red stuff: ferric iron (goethite [FeO(OH)])

Page 11: Rob Edwards phage.sdsu/~rob San Diego State University

Nitrification differentiates the samples

Edwards (2006)BMC Genomics

Page 12: Rob Edwards phage.sdsu/~rob San Diego State University

The challenge is explaining the differences between samples

Red Sample

Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal

metabolism

Black Sample

Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate

degradation

Page 13: Rob Edwards phage.sdsu/~rob San Diego State University

We can cheaply compare the importantbiochemistry happening in different

environments

We don’t care which organisms are doing the metabolism but we know what organisms are

there

Page 14: Rob Edwards phage.sdsu/~rob San Diego State University

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Page 15: Rob Edwards phage.sdsu/~rob San Diego State University

Why Phages?

• Phages are viruses that infect bacteria– 10:1 ratio of phages:bacteria

– 1031 phages on the planet

• Specific interactions (probably)– one virus : one host

• Small genome size– Higher coverage

• Horizontal gene transfer– 1025-1028 bp DNA per year in the oceans

• Can’t do fosmids

Page 16: Rob Edwards phage.sdsu/~rob San Diego State University

Phages In The Worlds Oceans

GOM41 samples

13 sites5 years

SAR1 sample

1 site1 year

BBC85 samples

38 sites8 years

ARC56 samples

16 sites1 year

LI4 sites1 year

Page 17: Rob Edwards phage.sdsu/~rob San Diego State University

Most Marine Phage Sequences are Novel

Page 18: Rob Edwards phage.sdsu/~rob San Diego State University

Thanks: Mya Breitbart

Phages are specific to environments

PhageProteomicTree v. 5(Edwards, Rohwer)

ssDNA

-like

T7-likeT4-like

Page 19: Rob Edwards phage.sdsu/~rob San Diego State University

Marine Single-Stranded DNA Viruses

• 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae)

• 40% viral particles in SAR are ssDNA phage

• Several full-genome sequences were recovered via de novo assembly of these fragments

• Confirmed by PCR and sequencing

Page 20: Rob Edwards phage.sdsu/~rob San Diego State University

12,297 sequence fragments hit using TBLASTXover a ~4.5 kb genome

3890 bp 4490 bp

0

1033

SAR Aligned Against the Chlamydia 4

Individual sequence reads

Chlamydia phi 4genome

Coverage

Concatenated hits

Page 21: Rob Edwards phage.sdsu/~rob San Diego State University

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Page 22: Rob Edwards phage.sdsu/~rob San Diego State University

Phages, Reefs, and Human Disturbance

Page 23: Rob Edwards phage.sdsu/~rob San Diego State University

Phages, Reefs, and Human Disturbance

The Northern Line IslandsExpedition, 2005

Christmas

Kingman

Christmas

Kingman

Palmyra

Washington

Fanning

Page 24: Rob Edwards phage.sdsu/~rob San Diego State University

Christmas to Kingman Bias in No. Phage HostsNegative numbers mean relatively more phage hosts at Kingman

More pathogens at Christmas.More people at Christmas.

More photosynthesis at Kingman.No people at Kingman.

Page 25: Rob Edwards phage.sdsu/~rob San Diego State University

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Page 26: Rob Edwards phage.sdsu/~rob San Diego State University

Phages enrich for important genesRios Mesquites Stromatolites• No photosynthesis genes in phages

Pozas Azules Stromatolites• 5 different photosynthesis genes in phages

Page 27: Rob Edwards phage.sdsu/~rob San Diego State University

RNR is the most successful reaction in evolution

Page 28: Rob Edwards phage.sdsu/~rob San Diego State University

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Page 29: Rob Edwards phage.sdsu/~rob San Diego State University

Computational Challenges

• Sequence annotations and analysis

– What is there?

– What is it doing?

– How is it doing it?

• Gene predictions in unknowns

– Lutz Krause (Bielefeld)

• Sequence comparisons

– BLAST

– Other ways to rapidly compare short sequences

– What happens when everyone is using 454

sequencing?

Page 30: Rob Edwards phage.sdsu/~rob San Diego State University

Sequence data from 21 libraries

6 million sequences600 million bp

• Each BLASTX search takes 1,000 CPU hours• 21 libraries = 21,000 CPU hours or 2.4 CPU years• Users want

• repeat runs, • TBLASTX, • more analysis• more data• more, more, more, more

Page 31: Rob Edwards phage.sdsu/~rob San Diego State University

SDSU Forest Rohwer Beltran Rodriguez-Brito

USF Mya Breitbart

Rohwer Lab Linda Wegley Florent Angly Matt Haynes

Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico)

Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller

MIT: Ed DeLong

FIG Veronika Vonstein Ross Overbeek Annotators

ANL Rick Stevens Bob Olsen CI Support

Also at SDSU Anca Segall Stanley Maloy

UBC Curtis Suttle Amy Chan

Page 32: Rob Edwards phage.sdsu/~rob San Diego State University