Evolution of gene family size change in fungi
-
Upload
jason-stajich -
Category
Technology
-
view
2.270 -
download
2
description
Transcript of Evolution of gene family size change in fungi
Evolution of gene family size change in fungi
Jason StajichUniversity of California, Berkeley
BD model uses information about the time in the phylogenetictree and the birth and death rates of genes, it offers an ideal nullmodel for hypothesis testing. Using a BD model in this waymakes it possible to identify gene families that have undergoneunusual expansions or contractions. This method furthermoreenables us to identify the branch in the phylogeny upon whichthe unlikely change took place.
As argued above, likelihoods or conditional likelihoods can-not directly be used to identify unusual gene families, becauselarger gene families will by necessity result in lower likelihoodsunder a stochastic BD process (the “large family bias”). Instead,we can use our conditional likelihoods as test statistics to calcu-late conditional P-values, each one conditioned on one of thepossible root-node assignments. Such a conditional P-value isdefined as the probability that a random gene family (with fixedroot family size) has a smaller conditional likelihood than thegiven gene family. Then, because the true root-node value is un-known, we conservatively pick the largest conditional P-value,which we can show to represent a tight upper bound on the trueP-value in our problem (see Methods; Supplemental material).Such an upper bound on the P-value is called a supremum P-value in statistics, and it is often used for composite hypothesistesting with one or more nuisance parameters (Lehmann 1959;Demortier 2003). Because of its tightness as an upper bound inour problem, we refer to the supremum P-value as simply theP-value in the remainder of this study. In the Methods section weshow how it can efficiently and accurately be computed using asampling procedure.
Furthermore, we propose two methods to identify thebranch in the phylogeny upon which nonrandom changes oc-curred (for families with a low P-value). Our first method com-putes a P-value corresponding to the observed data after the de-letion of one branch in the PGM, and this once for each branch(for each gene family). If, after the deletion of a branch, theresulting P-value rises above some threshold P-value (0.01 here),then the branch that was cut is implicated in nonrandom evo-lution. Our second method uses a likelihood ratio test to comparea model allowing the ! parameter to vary along each branchsingly to the model with one ! for the whole tree (see Methods;Supplemental materials). It is notable that, in all cases, thebranch with the largest likelihood ratio was also the branch thatyielded the largest P-value after cutting it, as computed by thefirst method.
Global view of Saccharomyces gene family evolutionWe used the machinery described above to study the evolution ofgene family size in five whole fungal genomes. To our knowl-edge, the five sequenced Saccharomyces genomes are the best ex-ample of a closely related group of eukaryotes, where multiplewhole genomes have been sequenced and where there is also awell-supported phylogenetic tree with branch lengths.
The consensus phylogenetic tree of the five Saccharomycesspecies (Fig. 2) comes from the study of Rokas et al. (2003) thatused 106 orthologous genes from each of the species, singly andby concatenation. The tree had 100% bootstrap support at everynode. In Newick notation, the tree in Figure 2 is written (S. baya-nus (S. kudriavzevii(S. mikatae(S. paradoxus S. cerevisiae)))). Branchlengths were inferred from the data in Rokas et al. (2003) andKellis et al. (2003). They are indicated in Figure 2 as time, t, inmillion years. We estimated the evolutionary rate parameter ! as0.002 per million years (see Supplemental materials).
To define gene families, we took all of the genes in all fivespecies together and generated a pairwise matrix of distancesamong genes (see Supplemental materials). We then clusteredgenes using the TRIBE-MCL algorithm (Van Dongen 2000; En-right et al. 2002), and counted the number of genes in eachfamily that came from each species. By clustering all of the genesat the same time, we are able to confidently compare the size offamilies between genomes.
In the 32 million years since the most recent common an-cestor of the five species, 1254 of the 3517 gene families sharedamong them have changed in size; the remaining set are mono-morphic across the tree (of course, equal numbers of losses andgains in any single gene family will be unobservable). Using ourPGM we were able to infer the most likely ancestral gene familysizes for all of these gene families. This makes it possible to countchanges in gene family size on all eight branches of the tree, andenables us to infer their direction by a comparison of the speciesat the top and bottom of each branch in the tree. Expansionsoutnumbered contractions on four of the eight branches, andcontractions outnumbered expansions on the remaining four.Table 1 shows the number of families that expanded, contracted,or stayed the same on each branch of the tree.
We can see that along branches 2 and 3, leading to S. kudria-vzevii and S. mikatae, many more families have expanded thancontracted. Concomitant with this, these two genomes havemore genes (7144 and 7236) than any of the other three (6265,6128, and 6700 for S. bayanus, S. paradoxus, and S. cerevisiae; see
Figure 2. The phylogenetic tree. Branch lengths t are given in millionsof years. The branch numbers used in this study are shown in circles.
Table 1. The number of gene families that showed an expansion,no change, or a contraction along the eight branches, accordingto the most likely assignments of the gene family sizes ofthe ancestors
Branch # ExpansionsNo
change ContractionsAverage
expansion
1 (t = 32) 97 3181 239 !0.0502 (t = 27) 383 3032 102 0.0953 (t = 22) 509 2922 86 0.1474 (t = 12) 96 3383 38 0.0195 (t = 12) 44 3426 47 0.0216 (t = 5) 3 3491 23 !0.0057 (t = 10) 10 3313 194 !0.0528 (t = 5) 2 3515 0 0.001
The first column contains the branch number, along with the length ofthe branch, t, in millions of years. The next three columns show howoften an expansion, no change, or a contraction occurred along thisbranch. The last column shows the average gene family expansionamong all families along each branch, where a contraction is counted asa negative expansion.
Gene family evolution
Genome Research 1155www.genome.org
N.crassaA.gossypiiR.oryzaeA.oryzaeA.terreusC.cinereusU.maydis
101
1
10
100
1000
10000
100
Family size
Fre
qu
en
cy o
f F
am
ily s
ize
Outline
• Gene family size change - a model
• Cornucopia of fungal genomes
• Methodology for comparing family size
• Lineage specific expansions
Gene family evolution• Gene duplications are the crucible of new
genes and thus new functions
• Many comparative approaches focus only on identifiable one-to-one orthologs.
• Signature of adaptive evolution can be confounded in multi-gene families
• How important is lineage-specific expansion in adaptive changes?
Identifying family expansions
• Previous work only considered pairwise
• Ad hoc comparison of gene family sizes
• C.elegans-C.briggsae - GPCR family expansions (Stein et al, PLOS Biology 2004)
• A. gambiae-D. melanogaster - Mosquito specific family expansions related to symbiotic bacteria (Holt et el, Science 2002).
• Need a null model
Gene family sizes follow power law distribution
N.crassaA.gossypiiR.oryzaeA.oryzaeA.terreusC.cinereusU.maydis
101
1
10
100
1000
10000
100
Family size
Fre
quency o
f F
am
ily s
ize
Multicopy genesSugar transporters
P450 Enzymes
single copy genesPRP8 (splicing)
CDC48 (cell cycle ATPase)
Phylogenetic evaluation of gene family size change• Previous methods only used ad hoc
statistics
• Explicit model for gene family size change according to a Birth-Death models
• Apply BD to family size along phylogeny using probabilistic graph models
• CAFE - Computational Analysis of gene Family Evolution Hahn et al, Genome Res 2005
De Bie, et al Bioinformatics 2006Demuth et al, submitted
CAFE
• Use a Probabilistic Graph Model for:
• Ancestral states
• Birth and Death rate (lamda)
• Per branch changes
• P-values
BD model uses information about the time in the phylogenetictree and the birth and death rates of genes, it offers an ideal nullmodel for hypothesis testing. Using a BD model in this waymakes it possible to identify gene families that have undergoneunusual expansions or contractions. This method furthermoreenables us to identify the branch in the phylogeny upon whichthe unlikely change took place.
As argued above, likelihoods or conditional likelihoods can-not directly be used to identify unusual gene families, becauselarger gene families will by necessity result in lower likelihoodsunder a stochastic BD process (the “large family bias”). Instead,we can use our conditional likelihoods as test statistics to calcu-late conditional P-values, each one conditioned on one of thepossible root-node assignments. Such a conditional P-value isdefined as the probability that a random gene family (with fixedroot family size) has a smaller conditional likelihood than thegiven gene family. Then, because the true root-node value is un-known, we conservatively pick the largest conditional P-value,which we can show to represent a tight upper bound on the trueP-value in our problem (see Methods; Supplemental material).Such an upper bound on the P-value is called a supremum P-value in statistics, and it is often used for composite hypothesistesting with one or more nuisance parameters (Lehmann 1959;Demortier 2003). Because of its tightness as an upper bound inour problem, we refer to the supremum P-value as simply theP-value in the remainder of this study. In the Methods section weshow how it can efficiently and accurately be computed using asampling procedure.
Furthermore, we propose two methods to identify thebranch in the phylogeny upon which nonrandom changes oc-curred (for families with a low P-value). Our first method com-putes a P-value corresponding to the observed data after the de-letion of one branch in the PGM, and this once for each branch(for each gene family). If, after the deletion of a branch, theresulting P-value rises above some threshold P-value (0.01 here),then the branch that was cut is implicated in nonrandom evo-lution. Our second method uses a likelihood ratio test to comparea model allowing the ! parameter to vary along each branchsingly to the model with one ! for the whole tree (see Methods;Supplemental materials). It is notable that, in all cases, thebranch with the largest likelihood ratio was also the branch thatyielded the largest P-value after cutting it, as computed by thefirst method.
Global view of Saccharomyces gene family evolutionWe used the machinery described above to study the evolution ofgene family size in five whole fungal genomes. To our knowl-edge, the five sequenced Saccharomyces genomes are the best ex-ample of a closely related group of eukaryotes, where multiplewhole genomes have been sequenced and where there is also awell-supported phylogenetic tree with branch lengths.
The consensus phylogenetic tree of the five Saccharomycesspecies (Fig. 2) comes from the study of Rokas et al. (2003) thatused 106 orthologous genes from each of the species, singly andby concatenation. The tree had 100% bootstrap support at everynode. In Newick notation, the tree in Figure 2 is written (S. baya-nus (S. kudriavzevii(S. mikatae(S. paradoxus S. cerevisiae)))). Branchlengths were inferred from the data in Rokas et al. (2003) andKellis et al. (2003). They are indicated in Figure 2 as time, t, inmillion years. We estimated the evolutionary rate parameter ! as0.002 per million years (see Supplemental materials).
To define gene families, we took all of the genes in all fivespecies together and generated a pairwise matrix of distancesamong genes (see Supplemental materials). We then clusteredgenes using the TRIBE-MCL algorithm (Van Dongen 2000; En-right et al. 2002), and counted the number of genes in eachfamily that came from each species. By clustering all of the genesat the same time, we are able to confidently compare the size offamilies between genomes.
In the 32 million years since the most recent common an-cestor of the five species, 1254 of the 3517 gene families sharedamong them have changed in size; the remaining set are mono-morphic across the tree (of course, equal numbers of losses andgains in any single gene family will be unobservable). Using ourPGM we were able to infer the most likely ancestral gene familysizes for all of these gene families. This makes it possible to countchanges in gene family size on all eight branches of the tree, andenables us to infer their direction by a comparison of the speciesat the top and bottom of each branch in the tree. Expansionsoutnumbered contractions on four of the eight branches, andcontractions outnumbered expansions on the remaining four.Table 1 shows the number of families that expanded, contracted,or stayed the same on each branch of the tree.
We can see that along branches 2 and 3, leading to S. kudria-vzevii and S. mikatae, many more families have expanded thancontracted. Concomitant with this, these two genomes havemore genes (7144 and 7236) than any of the other three (6265,6128, and 6700 for S. bayanus, S. paradoxus, and S. cerevisiae; see
Figure 2. The phylogenetic tree. Branch lengths t are given in millionsof years. The branch numbers used in this study are shown in circles.
Table 1. The number of gene families that showed an expansion,no change, or a contraction along the eight branches, accordingto the most likely assignments of the gene family sizes ofthe ancestors
Branch # ExpansionsNo
change ContractionsAverage
expansion
1 (t = 32) 97 3181 239 !0.0502 (t = 27) 383 3032 102 0.0953 (t = 22) 509 2922 86 0.1474 (t = 12) 96 3383 38 0.0195 (t = 12) 44 3426 47 0.0216 (t = 5) 3 3491 23 !0.0057 (t = 10) 10 3313 194 !0.0528 (t = 5) 2 3515 0 0.001
The first column contains the branch number, along with the length ofthe branch, t, in millions of years. The next three columns show howoften an expansion, no change, or a contraction occurred along thisbranch. The last column shows the average gene family expansionamong all families along each branch, where a contraction is counted asa negative expansion.
Gene family evolution
Genome Research 1155www.genome.org
Hahn et al, Genome Res 2005
37 Fully sequenced fungal genomes
Rhizopus oryazeNeurospora crassaPodospora anserinaChaetomium globosumMagnaporthe griseaFusarium verticillioidesFusarium graminearumTrichoderma reeseiSclerotinia sclerotiorumBotrytis cinereaStagonospora nodorumUncinocarpus reesiiCoccidioides immitisHistoplasma capsulatumAspergillus fumigatusAspergillus nidulansAspergillus terreusAspergillus oryzaeAshbya gosspyiiKluyveromyces lactisSaccharomyces cerevisiaeCandida glabrataCandida lusitaniaeDebaryomyces hanseniiCandida guilliermondiiCandida tropicalisCandida albicansCandida dubliniensisYarrowia lipolyticaSchizosaccharomyces pombeCryptococcus neoformansCryptococcus neoformans H99Cryptococcus gattii WM276Cryptococcus gattii R265Phanerochaete chrysosporiumCoprinus cinereusUstilago maydis
Saprophyte
Bread mold, Opp Hum pathogen
Saprophyte
Hemibiotroph - Rice
Hemibiotroph - wheat
Primary Hum pathogen
Primary Hum pathogen
Opp Hum pathogen
Saprophyte/Industrial uses
Hemibiotroph - maize
Saprophyte
Necrotroph
Necrotroph - fruits
Hemibiotroph - wheat
Opp Hum pathogen
Biotroph/Industrial uses
Industrial uses
Industrial uses
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogenSaprophyte
SaprophyteBiotroph - maize
Opp Hum pathogen
Opp Hum pathogen
Opp HumOpp Hum pathogen
Industrial uses
SaprophyteOpp Hum Pathogen
Saprophyte
pathogen
0100200300400500600700800900Million
years ago
Euascomycota
Zygomycota
Hemiascomycota
Archiascomycota
Basidiomycota
50+ More funded and in progress world-wide
Species Clade Sequencing centerSchizosaccharomyces japonicus Archaeascomycta Broad-FGISchizosaccharomyces octosporus Archaeascomycta Broad-FGIPneumocystis carinii Archaeascomycta Sanger, UC, Broad-FGIPneumocystis carinii hominis Archaeascomycta UC, Broad-FGI, UCAmanita bisporigera Basidiomycota: Homobasidiomycota MSUCrinipellis perniciosa Basidiomycota: Homobasidiomycota Univ CampinasGanoderma lucidum Basidiomycota: Homobasidiomycota Yang-Ming UnivHebeloma cylindrosporum Basidiomycota: Homobasidiomycota INRALaccaria bicolor Basidiomycota: Homobasidiomycota JGI-DOEPhakopsora pachyrhizi Basidiomycota: Homobasidiomycota JGI-DOEPostia placenta Basidiomycota: Homobasidiomycota JGI-DOESchizophyllum commune Basidiomycota: Homobasidiomycota JGI-DOESporobolomyces roseus Basidiomycota: Urediniomycota JGI-DOEPhakopsora meibomiae Basidiomycota: Urediniomycota JGI-DOEBatrachochytrium dendrobatidis Chytridiomycota Broad-FGI & JGI-DOEPiromyces sp. Chytridiomycota JGI-DOEGlomus intraradices Glomeromycota JGI-DOEPhycomyces blakesleeanus Zygomycota JGI-DOEBrachiola algerae Microsporidia GenoscopeNosema (Antonospora) locustae Microsporidia MBLEnterocytozoon bieneusi Microsporidia Tufts Univ
Table C.3: Additional funded fungal genome sequencing projects as of Spring 2006. This data was partially derivedfrom the Genomes online database (190)
124
Sequencing In-Progress*
R
RR
Species Clade Sequencing centerAspergillus niger Euascomycota: Eurotiomycota DOE-JGIAspergillus flavus Euascomycota: Eurotiomycota NCSUAspergillus clavatus Euascomycota: Eurotiomycota OUNeosartorya fischeri Euascomycota: Eurotiomycetes TIGRHistoplasma capsulatum WU24 Euascomycota: Eurotiomycota Broad-FGIHistoplasma capsulatum 186R,217B Euascomycota: Eurotiomycota WUSTLCoccidioides posadasii Euascomycota: Eurotiomycota TIGRCoccidioides immitis 10 strains Euascomycota: Eurotiomycota Broad-FGI & TIGRParacoccidioides brasiliensis Euascomycota: Eurotiomycota Univ of BrazilAscosphaera apis Euascomycota: Eurotiomycota BCMEpichloe festucae Euascomycota: Sordariomycetes UKPodospora anserina Euascomycota: Sordariomycetes Broad-FGITrichoderma atroviride Euascomycota: Sordariomycetes DOE-JGITrichoderma virens Euascomycota: Sordariomycetes DOE-JGILeptosphaeria maculans Euascomycota: Dothideomycetes GenoscopeAlternaria brassicicola Euascomycota: Dothideomycetes VPI & WUSTLXanthoria parietina (lichen) Euascomycota: Lecanoromycetes DOE-JGICandida albicans WO-1 Hemiascomycota Broad-FGILodderomyces elongisporus Hemiascomycota Broad-FGIPichia stipitis Hemiascomycota JGI-DOESaccharomces bayanus Hemiascomycota (49, 167)Saccharomces castellii Hemiascomycota (49)Saccharomces cerevevisiae RM11-1A Hemiascomycota Broad-FGISaccharomces cerevevisiae YJM789 Hemiascomycota (113)Saccharomyces kluyeri Hemiascomycota WUSTL (finishing)Saccharomces kudriavzevii Hemiascomycota (49)Saccharomces mikatae Hemiascomycota (49, 167)Saccharomces paradoxus Hemiascomycota (167)Saccharomyces pastorianus Hemiascomycota Kitasato UnivZygosaccharomyces rouxii Hemiascomycota CNRS-Genoscope
Table C.2: In progress and funded fungal genome sequencing projects as of Spring 2006. This data was partially derivedfrom the Genomes online database (190)
123
Sequ
enci
ng In
-Pro
gres
s*
+++
R
R
R
R
Genome annotation
• Many of the fungal genomes were only assembled genomic sequence.
• Automated annotation pipeline was built to generate to get systematic gene prediction.
• Several gene prediction programs were trained and results were combined with GLEAN (Liu, Mackey, Roo, et al unpublished) to produce composite gene calls.
Rhizopus oryazeNeurospora crassaPodospora anserinaChaetomium globosumMagnaporthe griseaFusarium verticillioidesFusarium graminearumTrichoderma reeseiSclerotinia sclerotiorumBotrytis cinereaStagonospora nodorumUncinocarpus reesiiCoccidioides immitisHistoplasma capsulatumAspergillus fumigatusAspergillus nidulansAspergillus terreusAspergillus oryzaeAshbya gosspyiiKluyveromyces lactisSaccharomyces cerevisiaeCandida glabrataCandida lusitaniaeDebaryomyces hanseniiCandida guilliermondiiCandida tropicalisCandida albicansCandida dubliniensisYarrowia lipolyticaSchizosaccharomyces pombeCryptococcus neoformansCryptococcus neoformans H99Cryptococcus gattii WM276Cryptococcus gattii R265Phanerochaete chrysosporiumCoprinus cinereusUstilago maydis
Saprophyte
Bread mold, Opp Hum pathogen
Saprophyte
Hemibiotroph - Rice
Hemibiotroph - wheat
Primary Hum pathogen
Primary Hum pathogen
Opp Hum pathogen
Saprophyte/Industrial uses
Hemibiotroph - maize
Saprophyte
Necrotroph
Necrotroph - fruits
Hemibiotroph - wheat
Opp Hum pathogen
Biotroph/Industrial uses
Industrial uses
Industrial uses
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogen
Opp Hum pathogenSaprophyte
SaprophyteBiotroph - maize
Opp Hum pathogen
Opp Hum pathogen
Opp HumOpp Hum pathogen
Industrial uses
SaprophyteOpp Hum Pathogen
Saprophyte
pathogen
0100200300400500600700800900
Million years ago
Euascomycota
Zygomycota
Hemiascomycota
Basidiomycota
Ascomycota
Intron frequency varies among the fungi
1 2 3 4 5 60
100
200
300
400
500
Mean Introns per gene
Me
an
In
tro
n le
ng
th
Rory
Umay
SpomCneo
PchrCcin
Cgla
Klac
Ylip
Scer
Dhan
Agos
Pans
SNAPTwinscan
GlimmerGenscan
ZFF to GFF3
GFF2 to GFF3
Tools::Genscan
Tools::Glimmer
BLASTZBLASTN
SearchIO
Bio:
:DB:
:GFF
HMMER
GFF to AA
Genome
Proteins
predicted proteins
SearchIO
ProteinsFASTAall-vs-all
exonerateprotein2genome
GFF2 to GFF3
Findorthologs
protein to genome
coordinates
GLEAN(combiner)
Tools::GFF
exonerateest2genome
ESTs
SearchIO
MCL
Gene families
Rfam tRNAscan
Multiple sequence alignment
Intron mapping into
alignmentIntron analysis
Bio::AlignIOaa2cds alignment
AnalysisMethodology
http://fungal.genome.duke.edu
Generic Genome Browser
Methods: gene family identification
• All-vs-All pairwise sequence searches (FASTP)
• Cluster genes by similarity using Markov CLustering (MCL) algorithm
• Identify families with unusually large size changes along phylogeny with CAFE
• Use 37 fungal genomes from 5 major clades
A B C D E
FASTAall-vs-all
MCL
Gene families
CAFE
18 U. maydis
5 C. gattii R265
5 C. gattii WM276
5 gattii
5 C. neoformans JEC21
5 C. neoformans var grubii
5 neoformans
5 Cryptococcus
163 P. chrysosporium
141 C.cinereus
136 Homobasidiomycota
23 Hymenomycota
23 Basidiomycota
0100200300400
Family 1 P < 0.001 Branch A
Family 2 P < 0.001 Branch B
Family 3 P=0.02 Branch C,E
Family 4 P=0.03 Branch D
10 1 2
14 18 2
7 1 1
6 1 12
6 1 8
3 1 1+
Family count
Spec
ies
Families with significant expansionsVitamin & Cofactor transport
Lactose & sugar transport
Amine transport
Myo-instol, quinate, and glucose transport
Oligopeptide transport
ABC transporter
MFS, drug pump, & sugar transport
Transport
Monocarboxylate & sugar transport
ABC transport
Amino acid permease
Methytransferase
Cytochrome P450: CYP64
Cytochrome P450: CYP53,57A
Cytochrome P450
Kinase
Subtilase family
NADH flavin oxidoreductase
Aldehyde dehydrogenase
Aldo/kedo reductase
Multicopper oxidase
AMP-binding enzyme
TransportersKinasesP450
Oxidation
49 significant families
Transporters
• Of 45 significant families, 22 were related to transport
• Vitamin and amino acid transport
• Sugar and sugar-like transporters
• Multidrug and efflux pumps
• ABC transporters (ATP Binding Cassette)
21 Rhizopus oryaze
20 Neurospora crassa
27 Podospora anserina
19 Chaetomium globosum27
28
46 Magnaporthe grisea
31
84 Fusarium verticillioides
62 Fusarium graminearum63
33 Trichoderma reesei38
31 Sordariomycetes
22 Sclerotinia sclerotiorum
25 Botrytis cinerea24
30
66 Stagonospora nodorum
25 Uncinocarpus reesii
22 Coccidioides immitis24
17 Histoplasma capsulatum25
44 Aspergillus fumigatus
50 Aspergillus nidulans
64 Aspergillus terreus
90 Aspergillus oryzae62
53
48
33 Eurotiomycota
32
30 Euascomycota
5 Ashbya gosspyii
9 Kluyveromyces lactis8
7 Saccharomyces cerevisiae
6 Candida glabrata7
8
18 Candida lusitaniae
24 Debaryomyces hansenii
18 Candida guilliermondii18
17
8 Candida tropicalis
7 Candida albicans
7 Candida dubliniensis7
8
15 Candidacae
15
30 Yarrowia lipolytica
21 Hemiascomycota
23 HemiEuascomycota
10 Schizosaccharomyces pombe22 Ascomycota
33 Cryptococcus neoformans neoformans
32 Cryptococcus neoformans grubii32
25 Cryptococcus gattii WM276
26 Cryptococcus gattii R26526
28
27 Phanerochaete chrysosporium
24 Coprinus cinereus
24
13 Ustilago maydis22 Basidiomycota
22
22
0100200300400500600700800900
Vitamin & Cofactor
Transporters
Marked branches with
significant (P<0.05)
expansions or contractions
Transporter expansions• Sugar related, Drug pump, and Major
Facilitator Superfamily
• Aspergillus spp, Fusarium spp, S. nodorum
• Euascomycota
• Vitamin transport
• C. neoformans, Fusarium
• A. nidulans (Biotin)
• Saccharomyces expansions independent!
Aspergillus
Fusarium
S. nodorum
C. neoformans
Sugar transporter use in phytopathogens
• Sugar transporters are used to extract nutrients from host
• Haustorium: specialized structure for plant parasitism
• Many sugar transporters highly and specifically expressed in haustoria
Haustorium
http://tolweb.org/Robert Bauer
Podospora anserina
Chaetomium globosum
Neurospora crassaPyrenomycota
Magnaporthe grisea
Fusarium verticillioides
Fusarium graminearum
Trichoderma reesei
Sordariomycota
Sclerotinia sclerotiorum
Botrytis cinerea
Aspergillus terreus
Aspergillus oryzae
Aspergillus nidulans
Aspergillus fumigatusAspergillus
Uncinocarpus reesii
Coccidioides immitis
Histoplasma capsulatum
Eurotiomycota
Stagonospora nodorum
Debaryomyces hansenii
Candida guilliermondii
Candida lusitaniae
Candida albicans
Candida dubliniensis
Candida tropicalis
Candidacae
Ashbya gosspyii
Kluyveromyces lactis
Saccharomyces cerevisiae
Candida glabrata
Saccharomyces
Yarrowia lipolytica
Hemiascomycota
Hemi-Euascomycota
Schizosaccharomyces pombe
Ascomycota
Cryptococcus neoformans
Cryptococcus neoformans grubii
Cryptococcus gattii WM276
Cryptococcus gattii R265Phanerochaete chrysosporium
Coprinus cinereusHomobasidiomycota
Hymenomycota
Ustilago maydisBasidiomycota
Rhizopus oryaze0100200300400500600700800900
400 My
966 My
Basidiomycota changes
U.maydisC.cinereusP.chrysosporiumC.neoformans
P450 CYP64
18 U. maydis
5 C. gattii R265
5 C. gattii WM276
5 gattii
5 C. neoformans JEC21
5 C. neoformans var grubii
5 neoformans
5 Cryptococcus
163 P. chrysosporium
141 C.cinereus
136 Homobasidiomycota
23 Hymenomycota
23 Basidiomycota
0100200300400
P450 enzymes involved in synthesis and cleavage of chemical bonds. Drug metabolism in animals.
CYP64: Step in Aspergillus spp aflatoxin pathwayP. chrysosporium implicated in lignin and hydrocarbon degradation.
Million yearsago
CYP64 was from independent duplication
ccin
03995 ccin
12432
ccin
12477
ccin
12447
ccin
03760 ccin
08843
ccin
08880
ccin
08948
ccin
08949
ccin
09228
ccin
08947
ccin
08946
ccin
12431
ccin
12515
ccin
08608
ccin
12514
ccin
07535
ccin
07536
ccin
05141
ccin
07531
ccin
04462
ccin
04461
ccin
04460
ccin
07538
ccin
01326
ccin
04884
ccin
07555
ccin
07554
ccin
03994
ccin
12516
ccin
09337
ccin
09357
ccin
09950
ccin
00042
ccin
00039
ccin
00043
ccin
11079
ccin
11073
ccin
12301
ccin
03618
ccin
12868
ccin
12386
ccin
03622
ccin
09244
ccin
08520
ccin
10950
ccin
13218
ccin
13220
pchr
04215
pchr
02481
pchr
02475
pchr
02324
pchr
02322
pchr
02461
pchr
02249
pchr
02248
pchr
02471
pchr
02460
pchr
02469
pchr
02472
pchr
02470
pchr
02468
pchr
02462
pchr
02442
pchr
02441
pchr
02317
pchr
02473
pchr
02477
pchr
02480
pchr
02479
pchr
02474
pchr
02459
pchr
02478
pchr
09197
pchr
08602
pchr
02326
pchr
08048
pchr
08046
pchr
08045
pchr
08047
pchr
02328
pchr
06733
tree 1
6195
pchr
10861
pchr
07430
pchr
07443
C. cinereus expansion P. chrysosporium expansion
Tom VolkMario Cervini
Local duplications created CYP64 expansion
9k 10k 11k 12k 13k 14k 15k 16k 17k 18k 19k 20k 21k 22k 23k 24k
pchr_24pchr_24
GLEAN modelsGLEAN_02414
Probability 1
GLEAN_02415
Probability 0.999937
GLEAN_02416
Probability 0.646357
GLEAN_02417
Probability 0.990598
Pfam domainsp450
Cytochrome P450 evalue:1e-28
p450 p450
Cytochrome P450 evalue:9e-07
p450p450
Cytochrome P450 evalue:6.3e-23
p450p450
Cytochrome P450 evalue:6e-26
p450
Interpretation of CYP64 expansion
18 U. maydis
5 C. gattii R265
5 C. gattii WM276
5 gattii
5 C. neoformans JEC21
5 C. neoformans var grubii
5 neoformans
5 Cryptococcus
163 P. chrysosporium
141 C.cinereus
136 Homobasidiomycota
23 Hymenomycota
23 Basidiomycota
0100200300400Million yearsago
Angiosperm diversification
Hydrophobin Family
• Self assembling proteins involved in fungal cell wall
• Part of what makes a mushroom
• 8 Cysteine residues critical to function
• Help spores stay airborne resisting water
P.chr C.cin C.neo U.may
21 33 0 2
umay UM05010umay UM04433
ccin 10587ccin 10586
ccin 05414ccin 09268
ccin 05081ccin 11692
ccin 11691ccin 12456
ccin 12439ccin 03506
ccin 03524ccin 12453
ccin 06183ccin 06192
ccin 06185ccin 06184ccin 06194ccin 08744
ccin 06204ccin 05130ccin 05145
ccin 00406pchr 10481
pchr 10482pchr 03412
pchr 08984pchr 06735
pchr 09319pchr 02564
pchr 02565pchr 02739
pchr 09062pchr 09061
pchr 09060pchr 09067
pchr 00495pchr 08523
pchr 11384pchr 11183
pchr 11134pchr 00475
pchr 09066pchr 00499
ccin 08205ccin 08203ccin 08204
ccin 08198ccin 08201
ccin 08202ccin 08199
ccin 13133ccin 05197ccin 05199
ccin 086570.1
Local Duplications
C. cinereus
P. chrysosporium
Cryptococcus sugar transporters expansion
19 U.maydis
47 C.gattii R265
50 C.gattii WM276
50 gattii
57 C.neoformans JEC21
59 C.neoformans H99
57 neoformans
50 Cryptococcus
23 P.chrysosporium
20 C.cinereus
23 Homobasidio
24 Hymenomycota
24
0100200300400
FIG
.1.
The
exte
rnal
and
inte
rnal
stru
ctur
esof
C.n
eofo
rman
sar
esh
own
bym
eans
ofa
mod
ified
Indi
ain
kpr
epar
atio
n.M
agni
ficat
ion,
ca.!
1,00
0.
Con
tinue
dfro
mpr
eced
ing
page
2291
Cryptococcus sugar transporters
• 3x as many sugar transporters in C. neoformans (~50) than other basidiomycetes
• “sugar coated killer”
• Capsule is a mixture of glucose, xylose, and mannose.
• Transporters could be important in capsule synthesis
Zerpa et al, 1996
Conclusions
• Transporters are highly expanded in independent lineages
• Saprophytic and phytopathogenic lifestyles
• Adaptive Homobasidiomycete (mushroom) expansions
• Lignin degradation - saprophytic lifestyles
• Hydrophobins - cell wall structures
Acknowledgments
Matthew Hahn (Indiana)Jeff Demuth
Sang-Gook Han
Tijl De BieNello Cristianini
Aaron MackeyIan Korf
Mario Stanke
Fred Dietrich (Duke)
Sequencing centersBroad Institute
Joint Genome InstituteGénolevures
Stanford UniversityTIGR
Welcome Trust Sanger Centre(NIH and NSF)