Olivier Elemento, Tavazoie lab
description
Transcript of Olivier Elemento, Tavazoie lab
Ab initio genotype-phenotype association reveals intrinsic
modularity in genetic networks (in bacteria)
Olivier Elemento, Tavazoie lab
Motility Spore formation
Gram-staining Hyper-thermophily
Some bacterial phenotypes …
Can we find the genes underlying these phenotypes ?
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
Motility in bacteria
• Some (but not all) bacteria are motile
• Motile bacteria may share genes involved in motility
• These genes may be absent from non-motile bacteria
Motility
present absent
B. s
ubtil
isB
. ant
hrax
C. j
ejeu
ni
~200bacterial genomes
E. c
oli
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
M. t
uber
culo
sis
S. a
ureu
s
M. L
epra
eS.
Pne
umon
ae
… …
Motility
B. s
ubtil
isB
. ant
hrax
C. j
ejeu
ni
~200bacterial genomes
E. coli Gene XE
. col
i
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
M. t
uber
culo
sis
S. a
ureu
s
M. L
epra
eS.
Pne
umon
ae
… …
present absent
Motility
B. s
ubtil
isB
. ant
hrax
C. j
ejeu
ni
~200bacterial genomes
E. coli Gene X
…
E. c
oli
E. coli Gene Y
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
M. t
uber
culo
sis
S. a
ureu
s
M. L
epra
eS.
Pne
umon
ae
… …
…
Highcorrelation
Gene Y is likely involved in motility
present absent
Motility
B. s
ubtil
isB
. ant
hrax
C. j
ejeu
ni
~200bacterial genomes
E. c
oli
(Levesque et al, 2003; Jim, Parmar, Singh and Tavazoie, 2004)
M. t
uber
culo
sis
S. a
ureu
s
M. L
epra
eS.
Pne
umon
ae
… …
B. subtilis gene Z
…
present absent
(e.g. CheV)
• Calculate a phylogenetic profile for all 600,000 genes in bacteria (~1.2x10^8 BLASTs)
• Collect the genes most correlated to the phenotype in all bacteria that have the phenotype (~3,000 for motility)
• Merge homologous genes (based on sequence similarity)
~ 3,000 motility genes
Merging homologous (orthologous/paralogous) genes
75 groups of homologs (Generic Genes)
~ 3
,000
mot
ility
gen
es
E. coli Gene Y
B. subtilis Gene Y
B. anthrax Gene Y
C. jejeuni Gene Y
Generic Gene Y
Motility
Can we recover such modules ?
Generic Gene V
Generic Gene W
Generic Gene Y
Generic Gene Z
Motility
Can we recover such modules ?
Generic Gene V
Generic Gene W
Generic Gene Y
Generic Gene Z
Module 1
Module 2
Can we recover such modules ?
• Cluster Generic Gene profiles 1,000 times using Iclust with different random initializations (obtain slightly different clusters)
• Group together genes which almost always end up in the same cluster
Iclust: Slonim et al, 2006
GG-3 flagellar biosynthetic protein flhBGG-4 flagellar biosynthetic protein flhAGG-5 flagellar biosynthetic protein fliPGG-22 flagellar biosynthetic protein fliRGG-56 flagellar biosynthetic protein fliQGG-6 flagellar hook flgE/F/GGG-7 flagellar motor switch fliGGG-10 flagellar basal-body rod flgCGG-12 flagellar MS-ring fliFGG-13 flagellar hook-associated protein 1 flgKGG-18 flagellar motor switch fliNGG-21 flagellar motor switch fliMGG-27 flagellar hook-associated protein 3 flgLGG-29 flagellar hook-associated protein 2 fliDGG-8 flagellin fliCGG-17 motility protein A motAGG-74 flagellar protein fliSGG-20 motility protein B motBGG-1 methyl-accepting chemotaxis protein
GG-11 chemotaxis protein cheAGG-45 methyl-accepting chemotaxis proteinGG-73 methyl-accepting chemotaxis proteinGG-38 chemotaxis protein cheVGG-15 chemotaxis protein cheWGG-2 chemotaxis methyltransferase cheRGG-30 glutamate methylesterase cheB
GG-32 flagellar L-ring protein precursor flgHGG-36 flagellar P-ring protein precursor flgI
GG-9 RNA-polymerase sigma-54 factorGG-14 transcription factor, sigma-54-dependent
Motility GG index
Moti
lity G
G index
These results are based on no prior knowledge, apart from genome sequences along with their phenotypic annotations
Phylogenetic profiles / modules for motility
Motility
fliI, cheY
…
fliO, cheZ
E. coli chemotaxis and flagellum modules
Some E. coli genes are not recovered. Why ?
GG-9 PAL peptidoglycan-associated lipoproteinGG-10 tolQ/exbB proteinGG-12 tolB proteinGG-72 lipid A biosynthesis lauroyl acyltransferase
GG-2 3-deoxy-manno-octulosonate cytidylyltransferaseGG-3 UDP-3-O glucosamine N-acyltransferaseGG-4 lipid-A-disaccharide synthaseGG-5 polysialic acid capsule expression proteinGG-7 UDP-3-O N-acetylglucosamine deacetylaseGG-8 3-deoxy-D-manno-octulosonic-acid transferaseGG-11 tetraacyldisaccharide 4'-kinaseGG-1 outer membrane protein yaeT
GG-68 glutaredoxin 3GG-29 2-octaprenyl-6-methoxyphenol hydroxylaseGG-31 glutathione synthetaseGG-18 glutaredoxin-related proteinGG-73 coproporphyrinogen III oxidase, aerobicGG-107 hydroxyacylglutathione hydrolase
GG-20 HlyD family secretion proteinGG-96 HlyD family secretion proteinGG-53 HlyD family secretion proteinGG-111 membrane fusion protein (MFP)GG-15 pyridoxal phosphate biosynthetic proteinGG-52 pyridoxal phosphate biosynthetic proteinGG-35 ABC transporter, permease
Phylogenetic profiles / modules for Gram-staining
GG-8 sporulation-blocking protein yabPGG-130 sporulation sigma-E factor processing peptidaseGG-58 stage III sporulation protein ACGG-6 stage III sporulation protein ADGG-3 stage III sporulation protein D
GG-63 spore-cortex-lytic enzymeGG-87 spore germination proteinGG-104 spore proteaseGG-136 spore protease relatedGG-71 stage III sporulation protein ABGG-103 stage III sporulation protein AEGG-132 stage III sporulation protein AGGG-95 stage II sporulation protein EGG-137 stage II sporulation protein MGG-11 stage II sporulation protein PGG-134 stage II sporulation protein RGG-135 stage IV sporulation proteinGG-76 stage IV sporulation protein AGG-46 stage IV sporulation protein BGG-40 stage V sporulation protein ACGG-34 stage V sporulation protein ADGG-15 stage V sporulation protein AFGG-37 translocation-enhancing proteinGG-94 hypothetical membrane proteinGG-127 hypothetical membrane protein
GG-49 small acid-soluble spore protein I sspIGG-69 spoVID-dependent spore coat assembly factorGG-101 spore coat proteinGG-52 spore coat protein EGG-99 spore coat related, putativeGG-97 spore cortex biosynthesis, putativeGG-84 spore germination proteinGG-90 spore germination proteinGG-55 spore germination protein C1GG-62 sporulation initiation phosphotransferaseGG-113 stage III sporulation protein AFGG-64 stage IV sporulation protein FAGG-91 stage VI sporulation protein DGG-54 abi, CAAX amino terminal proteaseGG-42 cytochrome C-550/C-551GG-53 cytochrome C oxidase subunit IVGG-36 menaquinol-cytochrome C reductase qcrCGG-50 lipoprotein, putativeGG-18 prespore-specific transcriptional regulator GG-66 putative lipoproteinGG-56 putative ribonuclease HGG-26 reductase ribT / acetyltransferase gnaTGG-124 hypothetical membrane proetinGG-118 hypothetical membrane proteinGG-29 hypothetical cytosolic proteinGG-38 hypothetical cytosolic proteinGG-120 hypothetical cytosolic proteinGG-24 hypothetical proteinGG-27 hypothetical proteinGG-28 hypothetical proteinGG-30 hypothetical proteinGG-31 hypothetical proteinGG-32 hypothetical proteinGG-33 hypothetical proteinGG-41 hypothetical proteinGG-43 hypothetical proteinGG-47 hypothetical proteinGG-60 hypothetical proteinGG-61 hypothetical proteinGG-65 hypothetical proteinGG-67 hypothetical proteinGG-68 hypothetical proteinGG-70 hypothetical proteinGG-72 hypothetical proteinGG-73 hypothetical proteinGG-83 hypothetical proteinGG-88 hypothetical protein, HD domainGG-100 hypothetical protein (ecsc)GG-114 hypothetical proteinGG-116 hypothetical proteinGG-117 hypothetical protein
Focused hypotheses for experimental validation
• Community sequencing
Conclusion
• Systematic association of genotype / phenotype for several phenotypes
• Clustering reveals robust modules that corresponds to protein complexes, signal transduction pathways, enzymatic pathways
• Many predictions that can be verified experimentally
Acknowledgements
• Saeed Tavazoie
• Noam Slonim
• Tavazoie lab members