Rob Edwards phage.sdsu/~rob San Diego State University

Post on 12-Jan-2016

27 views 3 download

Tags:

description

SGM Meeting, Warwick, April 2006. Challenges for metagenomic data analysis and lessons from viral metagenomes [What would you do if sequencing were free?]. Rob Edwards http://phage.sdsu.edu/~rob San Diego State University Fellowship for Interpretation of Genomes. Outline. - PowerPoint PPT Presentation

Transcript of Rob Edwards phage.sdsu/~rob San Diego State University

Challenges for metagenomic data analysis and lessons from viral metagenomes

[What would you do if sequencing were free?]

Rob Edwards

http://phage.sdsu.edu/~rob

San Diego State UniversityFellowship for Interpretation of Genomes

SGM Meeting, Warwick, April 2006

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

This is all 454 sequence data

• 21 libraries– 10 microbial, 11 phage

• 597,340,328 bp total– 20% of the human genome– 50% of all complete and partial microbial

genomes

• 5,769,035 sequences– Average 274,716 per library

• Average read length 103.5 bp– Av. read length has not increased in 7 months

• Cost 0.04¢ per bp

Sequencing is cheap and easy.

Bioinformatics is neither.

The Soudan Mine, Minnesota

Red Stuff OxidizedBlack Stuff Reduced

Red and Black Samples Are Different

Cloned and 454 sequenced16S are indistinguishable

Black stuff

Red

ClonedRed

There are different amounts of metabolism in each environment

There are different amounts ofsubstrates in each environment

BlackStuff

RedStuff

But are the differences significant?

• Sample 10,000 proteins from site 1• Count frequency of each “subsystem”• Repeat 20,000 times

• Repeat for sample 2

• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI

• Compare medians from sites 1 and 2 with 95% CI

Rodriguez-Brito (2006). BMC Bioinformatics

Subsystem differences & metabolism

Iron acquisitionBlack Stuff

Siderophore enterobactin biosynthesisferric enterobactin transportABC transporter ferrichromeABC transporter heme

Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])

Red stuff: ferric iron (goethite [FeO(OH)])

Nitrification differentiates the samples

Edwards (2006)BMC Genomics

The challenge is explaining the differences between samples

Red Sample

Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal

metabolism

Black Sample

Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate

degradation

We can cheaply compare the importantbiochemistry happening in different

environments

We don’t care which organisms are doing the metabolism but we know what organisms are

there

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Why Phages?

• Phages are viruses that infect bacteria– 10:1 ratio of phages:bacteria

– 1031 phages on the planet

• Specific interactions (probably)– one virus : one host

• Small genome size– Higher coverage

• Horizontal gene transfer– 1025-1028 bp DNA per year in the oceans

• Can’t do fosmids

Phages In The Worlds Oceans

GOM41 samples

13 sites5 years

SAR1 sample

1 site1 year

BBC85 samples

38 sites8 years

ARC56 samples

16 sites1 year

LI4 sites1 year

Most Marine Phage Sequences are Novel

Thanks: Mya Breitbart

Phages are specific to environments

PhageProteomicTree v. 5(Edwards, Rohwer)

ssDNA

-like

T7-likeT4-like

Marine Single-Stranded DNA Viruses

• 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae)

• 40% viral particles in SAR are ssDNA phage

• Several full-genome sequences were recovered via de novo assembly of these fragments

• Confirmed by PCR and sequencing

12,297 sequence fragments hit using TBLASTXover a ~4.5 kb genome

3890 bp 4490 bp

0

1033

SAR Aligned Against the Chlamydia 4

Individual sequence reads

Chlamydia phi 4genome

Coverage

Concatenated hits

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Phages, Reefs, and Human Disturbance

Phages, Reefs, and Human Disturbance

The Northern Line IslandsExpedition, 2005

Christmas

Kingman

Christmas

Kingman

Palmyra

Washington

Fanning

Christmas to Kingman Bias in No. Phage HostsNegative numbers mean relatively more phage hosts at Kingman

More pathogens at Christmas.More people at Christmas.

More photosynthesis at Kingman.No people at Kingman.

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Phages enrich for important genesRios Mesquites Stromatolites• No photosynthesis genes in phages

Pozas Azules Stromatolites• 5 different photosynthesis genes in phages

RNR is the most successful reaction in evolution

Outline

• The envy is not mine

• A tour around the world, thanks to phage

• People suck

• What is the most successful gene in

evolution?

• Is there a Future?

Computational Challenges

• Sequence annotations and analysis

– What is there?

– What is it doing?

– How is it doing it?

• Gene predictions in unknowns

– Lutz Krause (Bielefeld)

• Sequence comparisons

– BLAST

– Other ways to rapidly compare short sequences

– What happens when everyone is using 454

sequencing?

Sequence data from 21 libraries

6 million sequences600 million bp

• Each BLASTX search takes 1,000 CPU hours• 21 libraries = 21,000 CPU hours or 2.4 CPU years• Users want

• repeat runs, • TBLASTX, • more analysis• more data• more, more, more, more

SDSU Forest Rohwer Beltran Rodriguez-Brito

USF Mya Breitbart

Rohwer Lab Linda Wegley Florent Angly Matt Haynes

Stromatolites Janet Seifert Rice University) Valeria Souza (UNAM, Mexico)

Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller

MIT: Ed DeLong

FIG Veronika Vonstein Ross Overbeek Annotators

ANL Rick Stevens Bob Olsen CI Support

Also at SDSU Anca Segall Stanley Maloy

UBC Curtis Suttle Amy Chan