EVOLUTIONARY HISTORY OF PHAGES WITH dsDNA GENOMES Arcady Mushegian Stowers Institute, Kansas City,...

Post on 28-Mar-2015

217 views 1 download

Tags:

Transcript of EVOLUTIONARY HISTORY OF PHAGES WITH dsDNA GENOMES Arcady Mushegian Stowers Institute, Kansas City,...

EVOLUTIONARY HISTORY OF PHAGESWITH dsDNA GENOMES

Arcady MushegianStowers Institute, Kansas City, USA Image: A. Merkov © 2007, http://wsbs-msu.ru/foto

Galina GlazkoUniversity of Rochester, USA

Vladimir MakarenkovUniversité du Québec

à Montreal, Canada

Jing LiuU of Kansas -Utah Law

Siphoviridae Myoviridae Podoviridae Tectiviridae Lipothrixviridae

Fuselloviridae

ICTV CLASSIFICATION: VIRION SHAPE PLUS A FEW MOLECULAR CHARACTERS

RudiviridaeCorticoviridae Plasmaviridae

lambda T4 P60 SIFV

SIRV2 SSV1

Bam35c & PRD1

L2PM2

EVOLUTIONARY HISTORY OF PHAGES : A GENOME-BASED APPROACH ?

WHY ICTV APPROACH IS NOT ENOUGH :

- LOW RESOLUTION OF STRUCTURAL TRAITS

- STRUCTURE CONVERGENCE ?

- NO WAY TO ACCOUNT FOR HGT

CAN WE DO BETTER WITH GENOMES ?

- ARE THERE ENOUGH MOLECULAR TRAITS ?

- WILL HGT SWAMP EVERYTHING ?

RECOMBINATION DOES NOT HAVE TO BE HOMOLOGOUS

Modified from J.G.Lawrence et al., J.Bacteriol. 2002

PHYLOGENY BASED ON GENE CONTENT

THE CLOSER TWO GENOMES ARE, THE

MORE GENES THEY HAVE IN COMMON

- ALL CLASSES OF METHODS CAN BE APPLIED

- NO NEED FOR OMNIPRESENT GENES

- HOW TO MEASURE AND NORMALIZE ?

USED WITH BACTERIA (Koonin, Bork )

AND PHAGES (Rohwer and Edwards, J. Bacteriol. 2002)

- HGT WAS STILL UNACCOUNTED FOR

Bacteriophages

dsDNA ssDNA dsRNA dsRNAdomains

tailed filamentous icosahedral … …divisions

modus I modus II modus III modus IV modi … …

“RETICULATE EVOLUTION” PROPOSAL FOR PHAGE TAXONOMY (Lawrence et al., 2002)

kingdom

“ Phage SfV might belong to the domain of dsDNA viruses, the division of tailed bacteriophages, but to at least 3 modi : (i) Phages with HK97-like head proteins and maturation processes, (ii) Phages with Mu-like contractile tails, and (iii) Integrase-mediated temperate phages. ”

BUT ….

… HOW WE DERIVE THE MODI ?

… WHAT IS THE EVOLUTIONARY SCENARIO ?

( esp. given the claim of “a new view of viral evolution, classification, and taxonomy” ?? )

ON THE BRIGHT SIDE :

THE IMPORTANCE OF HGT

A POST-MODERN DISTRACTION: A TREE ( OF LIFE ) OR NOT A TREE?

THE GOAL OF EVOLUTIONARY RECONSTRUCTION

IS NOT “ TO BUILD A TREE ”, BUT TO LEARN WHAT HAPPENED

OR ?

W.F.Doolittle, Science 1999 © AAAS

PHAGE ORTHOLOGOUS GROUPS ( POGs ) - FOLLOWING THE NCBI COG FRAMEWORK

Tatusov et al., Science 1997; NCBI 1997-2007

VECTORS OF GENE CONTENT IN PHAGES

THE MAIN OBSERVATIONS

- FAT-TAILED DISTRIBUTION, MANY ZEROS

- HIGH POG CONTENT IN GENOMES

- av. 52 % (mostly 30 - 70 %)

- av. 42 % even for the unclassified phages

- “PHAGENESS QUOTIENT”

where ,

i.e., “is this POG more likely to be drawn

from a phage or from a cellular organism?”

Mp

Vp

i

i

f

fPQ log

M

nf

V

nf

MpM

p

VpV

pi

i

i

i ,

WHY HIGH PQ IS IMPORTANT

PQmax = ∞ (phage-specific POGs)

82% OF POGs HAVE PQ = ∞

+ 8% - HIGH PQ, FORM A CLADE

- MOST OF THE POGS TAKE NO PART IN HOST-VIRUS GENE TRANSFER

- WE CAN RESTATE THE TASK AS DETECTION OF PHAGE-PHAGE HGT

Mp

Vp

i

i

f

fPQ log

TREES FROM GENE CONTENT, NO HGT YET

- PRETEND THAT DESPITE HGT, EVOLUTION OF PHAGES IS TREE-LIKE

- BUILT CONVENTIONAL TREE, VERIFY THAT THE SIGNAL IS TRUE, THEN INFER HGT

- DISTANCE METHODS REQUIRE PROPER DISTANCE MEASURE ( SEPARATE STORY …)

NJ MrBayes

SiphoviridaePodoviridaeMyoviridaeFuselloviridaeTectiviridaeUnclassified

TREES FROM GENE CONTENT

SiphoviridaePodoviridaeMyoviridaeFuselloviridaeTectiviridaeUnclassified

TREES FROM GENE CONTENT - CONCLUSIONS

- VERTICAL SIGNAL IS CONSIDERABLE- RESAMPLING, SIMULATIONS, COMPATIBILITY

- 18 WELL-SUPPORTED GROUPS

- 71 % OF ALL PHAGES

- SOME GROUPS INCLUDE PHAGES WITH DIFFERENT MORPHOLOGY

- THE LARGEST 3 ICTV MORPHOTYPES DO NOT RESOLVE AS MONOPHYLETIC

GROUPS WITH SIMILAR MORPHOLOGY

group 2staphylococci phages

group 3Sfi21-like siphoviruses

group 7fuselloviruses

group 11-like siphoviruses

group 14T4-like myoviruses group 18

P2-like myoviruses

and groups 4, 5, 6, 8, 10Siphoviridae Myoviridae Fuselloviridae Unclassified

group 9PZA-like podoviruses group 13

mycobacteriophages

group 15T7-like podoviruses

group 16

and groups 1, 12, 17Siphoviridae Podoviridae Myoviridae Tectiviridae Unclassified

GROUPS WITH DISSIMILAR MORPHOLOGY

HOW TO INFER RETICULATIONS ?

IN PRACTICE : T-REX (MAKARENKOV, 1999-2007) - SPR OF TGC , MINIMIZE DISTANCE TO TSF ,

WHILE MAINTAINING SUB-TREE TOPOLOGIES

HGT EVENTS IN NUMBERS

• 294 HGT EVENTS 90 % within groups

• 114 (of 158) PHAGES ARE INVOLVED

• GENES FROM 229 POGs HAVE BEEN TRANSFERRED

• REMOVE THESE POGs :GROUPS STAY TOGETHER , SOME LOSE 1-2 MEMBERS

FREQUENCY DISTRIBUTIONS OF ALL HGT EVENTS ARE FAT-TAILED

MOST PHAGES RARELY EXCHANGE GENES MOST POGs ARE NEVER TRANSFERRED

THE HGT DEBATE ( NOT ONLY IN PHAGES ) IS ABOUT THE OPPOSITE TAILS OF THE SAME DISTRIBUTION

WHAT NEXT

- MORE PHAGES ( ~500 )- MORE TRAITS, EVEN FOR CURRENT COLLECTION- HIGHER DENSITY OF TAXON SPACE

- TRAITS OTHER THAN ORFs- COS, PAC SITES?

- BETTER METHODS

- ARE ICTV FAMILIES MONOPHYLETIC ?- RELATIONSHIP WITH HERPESVIRUSES ?- ON TO BACTERIAL EVOLUTION ?

http://www.stowers-institute.org/ScientistsSought/TrainingPrograms.asp

52% of the analyzed phage proteins are clustered into 981 Phage Orthologous Groups (POGs)

ICTV family Number of Number of Number of proteins POG genomes proteins in POGs coverage

Myoviridae 28 3538 1526 43%

Siphoviridae 80 5815 3431 59%

Podoviridae 31 1520 879 58%Tectiviridae 2 55 10 18%

Cortiviridae 1 22 3 14%

Plasmaviridae 1 14 2 14%

Fuselloviridae 4 134 79 59%

Lipothrixviridae 1 72 5 7%

Unclassified 16 1052 443 42%

Total number 164 12222 6378 52%

POGs shared by phage genomes are suitable characters in evolutionary reconstruction

461 POGs belong to 14 functional categories

L, replication, recombination, and repair

K, transcription

F, nucleotide transport and metabolism

X, virion assembly

S, unknown function

X,Y,Z,W,V,U,A – phage specific categories

Phages vary significantly in genome size and content

381

8

How to normalize shared gene count I.Correlation between gene-content trees and 16S rRNA-based tree

JC: Jaccard coefficient distance MB: Maryland bridge distance

kNN

kGGd JC

21

21 1),(

WA: Weighted average distance CORR: Standard correlation distance

YX

iiCORR

YYXXGGd

)()(

1),( 21

21

2121 2

)(1),(

NN

NNkGGdMB

21

22

21

212

1),(NN

NNkGGdWA

00.20.40.60.81

1.21.41.6

500 600 700 800 900 1000Genome size, N2

Dis

tan

ce v

alu

es

JC MB WA CORR

00.20.40.60.81

1.21.41.61.82

100 200 300 400 500 600 700 800 900 1000Genome size, N2

Dis

tan

ce v

alu

es

How to normalize shared gene count II. The effect of differences in genome sizes

Number of shared genes (k) = 100 Number of shared genes (k) = 500

Size of Genome 1: N1 = 1000, size of Genome 2: N2

JC: Jaccard coefficient distance MB: Maryland bridge distance

WA: Weighted average distance CORR: Standard correlation distance

Possible evolutionary links between viruses from different host domains

Hendrix 1999 Curr. Biol.

Shared colors indicate proposed evolutionary connections between relevant viruses

Herpesvirus proteases (family S21)

1 1 2 3 4 2 3 4 5 6 7 5 HSV-2 (19)PIYVAGFLALY(6) ELAL-DPDTVRAAL(5) LPINVDHR(3) EVGRVL A-VVND(2)GPFFVGLI(3)QLERVL(20)RLLYLITNYLPSVSL-ST(15)----------FA--HVALCAI(13)LDAAIAP(73) VZV (1)ALYVAGYLALY(5) ELNI-TPEIVRSAL(5) IPINIDHR(3) VVGEVI A-IIED(2)GPFFLGIV(3)QLHAVL(20)RALYLVTNYLPSVSL-SS(5) ----------FT--HVALCVV(13)PESSIEP(66) HCMC (12)PVYVGGFLARY(7) ELLL-PRDVVEHWL(13)LPLNINHD(3) VVGHVA A-MQSV(2)GLFCLGCV(3)RFLEIV(21)KVVEFLSGSYAGLSL-SS(27)----------FK--HVALCSV(13)PEWVTQR(73) EBV (5)SVYVCGFVERP(7) CLHL-DPLTVKSQL(5) LPLTVEHL(3) PVGSVF G-LYQS(2)GLFSAASI(3)DFLSLL(20)PKVEALHAWLPSLSL-AS(19)----------FD--HVSICAL(13)LAWVLKH(70) KSHV (3)GLYVGGFVDVV(7) ELYL-DPDQVTDYL(5) LPITIEHL(3) EVGWTL G-LFQV(2)GIFCTGAI(3)AFLELA(20)PLLEILHTWLPGLSL-SS(16)----------FQ--HVSLCAL(13)AEWVVSR(70)

Bacteriophage prohead proteases (family U35)

1) Prophages

CP-933C (20)SNTLTGYVVRW(12)EKFQ--RGAFTEWL(5) VRGLYEHD(3) LLGRTR(3)LKLEED(2)GLRFELTP(1)DTSTGR(1) VIELVKRGDISGMSF-GF(16)TVLVA-E---LY—-EITVTSV(3) PDSGVEL(28) LambdaBa04 (25)NRTLIGYAVKW(14)EQFK--NGAFTETL(4) QRFLWSHD(3) VLGRTK(3)LRLNED(2)GLRFELDL(1)DTTLGN(1) TYKSIKRGDVDGVSF-GF(17)TVTKA-K---LL--EVSAVAF(3) PDSEVSA(27) Lp3 (35)GKTISGYAIVW(11)EVVT--PKALDGVD(3) VLMLNNHD(3) VLASVK(3)LTLETD(2)GLHFTAQL(1)NTSFAN(1) VYEEVQSGNVDSCSF-GF(19)TINQVKS---LF--DVSVVAV(3) DDTNVQV(322)

2) Siphoviridae

HK97 (22)QGIFEGYASVF(7) DIIL--PGAFKNAL(6) VAMFFNHK(4) PVGKWD S-LAED(2)GLYVRGQL(2)GHSGAA(1) LKAAMQHGTVEGMSV-GF(14)IFKNIQA---LR--EISVCTF(3) EQAGIAA(68) P27 (30)SGEFEGYGSVF(7) DVVV--PGAFTTTL(9) PALLWQHR(3) PIGVY- TEMKED(2)GLYVRGRL(3)DDPLAK(1) AHAHMKAGSLTGLSI-GY(14)LLKEI-D---LW--EVSLVTF(3) DEARISD(61) phi-C31 (23)RISMRGYAYRF(11)ERIV--PGAGAPSL(4) VYATFNHD(3) LLGRTS(3)LRVGED(2)GGWYEIDL(1)DTTVGR(1) VAKLLKRGDLQGSSF-TF(22)EITAM-D---VV--ELGPVVN(3) PTTQASL(44) psiM100 (128)KRLVTGPVLVP(12)EQVE--EVAYKFME(2) QNIDIMHR(3) VARPVE(3)LRADEE(2)GVHLPRGT(3)TARIYD(2) IWEGVKTGKYTGFSI-TA(13)TLRELGW---PW--EVVTISI(3) PKAKYLS(135) psiM2 (82)QRIITGPVLVP(12)EQVE--RVAYKFME(2) QNVDILHR(3) VAKPVE(3)LREDTM(2)GVDLPEGT(3)SAKVYD(2) TWRGILEGKYQGFSI-TA(8) TLADIGW---PF--DVVTVSI(3) PKARYLS(131) Omega (35)TLVLEGYASTF(16)EQLD--RRAFEKTL(5) LHLLVNHA(2) PLARTK(3)LDLSVD(2)GLKVVARL(2)RDPDVQ(1) LAVKMERGDMDEMSF-AF(21)TITEV-S---LHKGDVSVVNF(3) PTTSVGL(235) phi3626 (26)TKTITGYASKY(16)EVVA--EGAFDNSL(4) IKALYNHN(3) VLGSTK(3)LRLESD(2)GLRFEIDL(1)NTTVAN(1) LYESVKRGDVDGTSF-GF(20)TLLEI-D---LY--EISPTPF(3) EDTEVDC(26) phiPV83 (19)EMVIEGYALKF(11)ETIS--RRALENTS(3) VRCLVDHI(3) IIGRTK(3)LELETD(2)GLKYRCKL(1)NTTFAR(1) LYENMRVGNINQCSF-GF(20)TLTAIRE---LT--DVSVVTY(3) KDTDVKP(31) A2 (21)PAVIEGYALKF(15)EHID--PHALDNAD(3) VVALFNHD(3) VLGRTG(2)LELTVD(2)GLKYTLTP(1)DTQLGR(1) LLENVRRGIISQSSF-AF(23)TINNIDH---LF--DVSPVTT(3) PDTEVKV(38) bIL285 (24)EKIISGYFIVF(11)EEIS--PESFDNVD(3) VRALIDHE(3) VLGRTK(3)LTLSVD(2)GVYGEIKV(2)NDTEAM(1) LYSRVQRGDVDQCSF-GF(17)TIKAI-E---LF--EVSVVTF(3) ADTAVEA(33) bIL309 (24)IGQIAGYAIKF(11)EYIA--PIALDNVD(3) VLALYNHD(3) VLGRVD(3)LKLSID(2)GLHFVLDM(1)DTTVGH(1) VYNNIKAGNLKGMSF-GF(18)IINQLQT---LS--EISVVSR(3) DDTSVQV(29)

3) Podoviridae

ST64B (30)SGEFEGYGSVF(7) DVVM--SGAFAASL(8) PALLWQHR(3) PIGVY- TEMKED(2)GLYVKGRL(3)DDPLAK(1) AHAHMKAGSLTGLSI-GY(14)LLKEI----DLW--EVSLVTF(3) DEARISD(61) phage V (21)PAHIIGYGSVF(11)EIIR--PGAFDDVL(3) VRALFNHD(3) ILGRSA(3)LNLSVD(2)GLRYDIQA(1)ETQTIR(2) VLAPMQRGDINQSSF-AF(17)VIREITRFSRLL--DVSPVTY(3) QEADSAV(34)

4) Myoviridae

P2 (6) KFFRIGVEGDT(1) DGRVISAQDIQEMA(9) CRINLEHL(10)RYGDV- AELKAE(8)KGKWALFA KITPTD(1) LIAMNKAAQKVYTSMEIQ(5) TGKCYLVGLAVT—-DDPASLG(22)PENLISV(121) HP1 (8) DFICIATSGYT(1) DGRQITAQELHEMA(9) ANLWPEHR(3) NMGQV- IELKAE(3)KGETQLFA IIAPNK(1) LIEYNRAGQYLFTSIEIT(5) SGKAYLSGLGVT—-DSPASVG(21)VDFSAKE(145) K139 (5) DWVIVATAGTT(2) DGRVISESWINDMA(9) ALIWPEHY(10)NWGEV- EELKAG(2)KDKLRLFA KLTPNH(1) LLEANKDGQKLFSSIEPE(5) EGRCYLLGLAVT—-DSPASSG(15)LECSALE(148) Mu (19)GWCQLLPAGHF(13)QGWFIDGEIAGRLV(9) VLIDYEHN(14)AAGWFN(1)DEMQWR -EGEGLFI HPRWTA(1) AQQRIDDGEFGYLSAVFP(4) TGAVLQIRLAALT-NDPGATG(16)QENKPMN(181)

5) Unclassified dsDNA phages

BFK20 (19)NGTFTAYASVF(7) DVVK--SGAFADTL(9) LPVLYGHD(3) PFSNIG(2)VEAEED(2)GLKITGKL(2)DNPKAA(1) VYKLLKEKRLSQMSF-AF(17)SIDKV-K---LY--EVSVVPI(3) QETEILA(43) phBC6A52 (17)QVILDGYVNVV(15)ERIV--PKTFEKAL(5) VDLLFNHD(3) NLGSIE(3)LELYED(2)GLRAIA-- -TVTDE(1) VIKKARNKELRGWSF-GF(17)SIEEL-E---LL--EVSILDM(5) VATSIET(42) phi13 (19)EMVIEGYALKF(11)ETIS--RRALENTD(3) VRCLVDHI(3) IIGRTK(3)LELETD(2)GLKYRCKL(1)NTTFAR(1) LYENMRVGNINQCSF-GF(20)TLTAIRE---LT--DVSVVTY(3) KDTDVKP(31)

Bacteriophage prohead proteases (family U9)

Aeh1 (31)KLYIEGIFMQS(12)KVL---QEAVTKYI(8) ALGELNHP(3) NVDPLH(1)AIIIEK(4)GNDVWGRA(3)EGDYAE(3) TAALIRAGWIPGVSSRGL(11)EVQEGFKLTVGV--DVVWGPS(3) PNAYVKP(32) T4 (36)GLYIEGIFMQA(12)RIL---EKAVKDYI(8) ALGELNHP(3) NVDPMQ(1)AIIIED(4)GNDVYGRA(3)EGDHGP(3) LAANIRAGWIPGVSSRGL(11)IVNEGFKLTVGV--DAVWGPS(1) PDAWVTP(30) S-PM2 (22)HLYIEGVFLQS(12)SVL---EKEVSRYN(8) ALGELGHP(3) TVNLDR(1)SHRITS(4)GSNFIGKA(3)ATPMGN(1) AKSLLDEGVRLGVSSRGM(11)VMDDFMHAATAA--DIDADPS(3) PDAFVNG(49) RM378 (4)DKTYTALIMEA(12)EAV---KKAVERMK(7) MYGELDHP(8) FVSLER(1)AVQWVD(4)GNKVYGKF(3)PTPYGN(1) VKSLLENGINFGFSLRGS(14)IVDDFFIT--AI--DVVAVPS(3) QSARVLQ(24)

Many dsDNA phage prohead proteases are herpesvirus assemblin-like serine proteases

Liu & Mushegian, 2004 J. Bacteriol.