Genomics of Microbial Eukaryotes

Post on 24-Jan-2016

32 views 0 download

Tags:

description

Genomics of Microbial Eukaryotes. Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA . Outline. Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm. Are you in the right room?. IMG. MycoCosm. - PowerPoint PPT Presentation

Transcript of Genomics of Microbial Eukaryotes

Genomics of Microbial Eukaryotes

Igor GrigorievFungal Genomics Program Head

US DOE Joint Genome Institute, Walnut Creek, CA

<ivgrigoriev@lbl.gov>

2

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm

3

Are you in the right room?

genome.jgi.doe.gov

IMG

MycoCosm

100+ annotated eukaryotic genomes

4

Started with Human Genome Project

5

Protein-based methods build CDS exons around known protein alignments.(Fgenesh, GeneWise)

GenBank protein

Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest)

EST contig

Predict model

Predict model

Ab initio methods use knowledge of known genes’ structures to predict start, stop, and splice sites in CDS only. (Fgenesh+, GeneMark)

Train on known genes

ATG TGA

GT AG

exons introns5’UTR5’UTR3’UTR3’UTR

Promoter PolyA

Gene model

Eukaryotic Gene Prediction

6

Predicted protein

Protein Annotation

Higher order assignments:

Gene Ontology terms

EC numbers --> KEGG pathways

Gene families, with and without other species

Possible orthologs

(in nr, SwissProt, KEGG, KOG)

Possible paralog

(Blastp+MCL)

Domain

(InterPro, tmhmm)

Signal peptide

(signalP)

7

EST Support is Critical for Eukaryotes

0%10%20%30%40%50%60%70%80%90%

100%

N.heam

atoco

cca

L.bic

olor

P.blake

sleea

nus

S.rose

us

M.g

ram

inico

la

A.niger

*

N.disc

reta

*

W.co

cos

G.trab

eum

A.acu

leatu

s

Other Genes

Supported by ESTs

Sanger 454 Illumina

5531

34

EST profile

CombEST gene models

8

Best Models

FGENESH

Representative set

GENEWISE

EXTERNAL MODELS

Multiple gene predictors offer several different gene models at each gene locus; A single best model from each locus is automatically selected based on homology and EST support; These compose a non-redundant (or Filtered) gene set for further analysis This set is further improved during community-driven manual curation

9

Genomic assembly and EST contigs

An

no

tati

on

Pip

elin

e

Gene predictions Gene predictions

Protein annotationsProtein annotations

Transcript + protein mapsTranscript + protein maps

Repeat maskRepeat mask

Manual curationManual curation

Bring it all together

Analysis

Gene familiesGene expressionPhylogenomicsProteomicsProtein targetingetc

Annotation

10

Many Genes of Eco-responsive Daphnia pulex

First crustacean, aquatic animal sequenced, new model organism30,940 predicted D.pulex genes in ~200Mb genome85% supported by 1+ lines of evidence Colbourne et al, Science, 2011

11

Half of Daphnia Genes have no Homologs

With Evgeny Zdobnov’s group (Univ. Genève)

* Of 716 highly conserved single copy orthologs, Daphnia is missing only two

12

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm

13

Fungal Genomics for Energy and Environment

Grow Grow DegradeDegrade

Lignocellulose degradation

Plant symbiontsand pathogens

SugarFermentation

FermentFerment

Bio-refinery

GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications

14

15

• Plant feedstock health• Symbiosis

• Plant Pathogenicity

• Biocontrol

• Biorefinery fungi• Lignocellulose degradation

• Sugar fermentation

• Industrial organisms

• Fungal diversity• Phylogentic

• Ecologic

Genomic Encyclopedia of Fungi Launched

www.jgi.doe.gov/fungi 100+ fungal genomes600+ registered users5000+ visitors/month

16

Distinct Mechanisms of Cellulose Degradation

White rotP.chrysosporium

Cellobiohydrolase IIGH6(CBH50)

Cellobiohydrolase IGH7 (CBH58,62)Endoglucanases

GH5-CBM1,GH12

GH3 -glucosidase

Cellulose

No cellulose binding domain CBM1 in brown rot!

Fe2+ + H2O2 Fe3+ + HO- + HO.

Fe3+

Glucose

Copper radical oxidasesGlucose oxidases

Iron reductase

Brown rotPostia placenta

Martinez et al, PNAS 2009

17

Diverse Basidiomycota

• FGP09 pilots• Basidio jam (Mar 2010)• 3 CSP11 proposals• Basidio jam (Mar 2011)

1818

Future Grand Challenges

Fungal isolates & groups

Systems of interacting organisms

Systems in wild

MODELING

FUNCTION

SEQUENCE

1. 1000 fungal genomessampling fungal diversity

2. Model fungisampling 100s of conditions

3. Fungal ecosystems: Bioenergy crops symbionts & pathogens Biorefinery Fungal metagenomes

19

Leadership in Sequencing Fungi

68%

23%

Ascomycota

Basidiomycota

Blastocladiomycota

Chytridiomycota

Glomeromycota

Microsporidia

Neocallimastigomycota

Unknown

Zygomycota

13%10%

31%

5%

41%

DOE Joint GenomeInstitute

Broad Institute

Sanger Institute

Washington Univ

other

20

Annotation and Analysis Tools

• Automated Annotation• Pipeline

• Genomics Analysis Platform• Genome Centric• Comparative Genomics

• Community Resource• Integrated data• User tools• Training

21

Genome-Centric View

Comparative View

www.jgi.doe.gov/fungi

2222

Genome-Centric View

Focus: functional genomics, user data deposition and curation

23

New Comparative View

2424

Community Building Tools

• Jamborees: • Genome analysis for publications

• MycoCosm Tutorials: • On-line video, MGM, workshops w/ large

meetings (Asilomar, JGI Users, MSA)

• Preparation for CSP: • Large meetings and focused groups

25

Summary

Eukaryotic Annotation Recipe:• Combined gene predictors,

experimental data, and community annotation

Fungal Genomics Program: • Scaled-up sequencing &

comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi)

26

Outline

Eukaryotic Genome Annotation

Fungal Genomics Program

MycoCosm