High Throughput Computational Sequence Analysis Rob Edwards [email protected] Argonne National...

25
High Throughput Computational Sequence Analysis Rob Edwards [email protected] Argonne National Laboratory San Diego State University
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of High Throughput Computational Sequence Analysis Rob Edwards [email protected] Argonne National...

Page 1: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

High Throughput ComputationalSequence Analysis

Rob [email protected]

Argonne National LaboratorySan Diego State University

Page 2: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Firstbacterial genome

100bacterial genomes

1,000bacterial genomesN

um

ber

of

know

n s

equence

s

Year

How much has been sequenced

Environmentalsequencing

Page 3: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced

One genome fromevery species

Most majormicrobial environments

Page 4: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

High Performance Computing

Page 5: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

TeraGrid

Page 6: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

The Teragrid National Resource

Page 7: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Life Sciences Gateway to TeraGrid

Page 8: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Subsystems

Page 9: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Subsystems make up metabolism

Wik

ipedia

Meta

bolis

mhtt

p:/

/en.w

ikip

edia

.org

/wik

i/Port

al:M

eta

bolis

m

Page 10: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Subsystems are not just metabolism

http://aig.cs.man.ac.uk/gallery/Utopia/

Enzyme complex

http://webdeptos.uma.es/

Cell Machinery

http://www.brown.edu/

Cell Processes

Page 11: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

http://www.theseed.org

Page 12: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

http://www.theseed.org

Page 13: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Growth in generation of subsystems

Page 14: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Microbial Genomics Annotation Platform

• Goal 1: Automate the generation of high quality annotations by leveraging the information contained in SubSystems and FIGfams.

• Goal 2: Minimize turnaround time. Initial target 48 hours

Page 15: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

• Automated process consisting of:– Gene calling– Initial annotation of function– Initial metabolic

reconstruction• Process takes 1-7 hours

depending on size and complexity of the genome

• ~20 genomes per day

• Password protected, secure, private

• Release to public databases if required

Freely available annotation service

http://www.nmpdr.org/anno-server/index48.cgi

Page 16: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Some estimate of annotation quality

05

101520253035404550

Bacillus

anthracis str.

Sterne

Mycobacterium

tuberculosisCDC1551

Listeria

monocytogenes

EGD-e

Streptococcuspyogenes M1

GAS

Staphylococcusaureus subsp.

aureus MW2

260799 83331 169963 160490 196620

% in SS SEED

% in SS SP1Ke

% hypothecial SP1Ke

% hypothetical SEED

Page 17: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Evaluation / Viewing

Page 18: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Download results

• We provide a number of export formats:– Genbank, Fasta, GFF3, Excel– can easily be extended to all formats supported by

BioPerl

• Genomes can be deleted by the user at any time (we keep them for max. 120 days)

• Genomes can be directly imported into the SEED if the user wishes

• all genomes are password protected

Page 19: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Metagenomics SEED

Page 20: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

http://metagenomics.theseed.org

Page 21: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Metagenome Metabolic Reconstruction

Page 22: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Starch utilization in cow rumens

Page 23: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Metabolic potential in environments

Page 24: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

Too much will be sequenced

One genome fromevery species

Most majormicrobial environments

Page 25: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University.

Acknowledgements

Argonne National LaboratoryRick StevensBob OlsonFolker Meyer

San Diego State UniversityForest Rohwer

Fellowship for Interpretation of Genomes

Ross OverbeekVeronika VonsteinThe Annotators