Proc. Natl. Acad. Sci. USAVol. 85, pp. 5997-6001, August 1988Evolution
Western equine encephalitis virus is a recombinant virus(RNA recombination/Alphavirus/evolution of RNA viruses)
CHANG S. HAHN, SHLOMO LUSTIG*, ELLEN G. STRAUSS, AND JAMES H. STRAUSStDivision of Biology, 156-29, California Institute of Technology, Pasadena, CA 91125
Communicated by James Bonner, April 14, 1988
ABSTRACT The alphaviruses are a group of 26 mosquito-borne viruses that cause a variety of human diseases. Many ofthe New World alphaviruses cause encephalitis, whereas theOld World viruses more typically cause fever, rash, andarthralgia. The genome is a single-stranded nonsegmentedRNA molecule of + polarity; it is about 11,700 nucleotides inlength. Several alphavirus genomes have been sequenced inwhole or in part, and these sequences demonstrate that alpha-viruses have descended from a common ancestor by divergentevolution. We have now obtained the sequence of the 3'-terminal 4288 nucleotides of the RNA of the New WorldAlphavirus western equine encephalitis virus (WEEV). Com-parisons of the nucleotide and amino acid sequences ofWEEVwith those of other alphaviruses clearly show that WEEV isrecombinant. The sequences of the capsid protein and of the(untranslated) 3'-terminal 80 nucleotides ofWEEV are closelyrelated to the corresponding sequences of the New WorldAlphavirus eastern equine encephalitis virus (EEEV), whereasthe sequences of glycoproteins E2 and El of WEEV are moreclosely related to those of an Old World virus, Sindbis virus.Thus,WEEV appears to have arisen by recombination betweenan EEEV-like virus and a Sindbis-like virus to give rise to a newvirus with the encephalogenic properties of EEEV but theantigenic specificity of Sindbis virus. There has been specula-tion that recombination might play an important role in theevolution of RNA viruses. The current finding that a wide-spread and successful RNA virus is recombinant providessupport for such an hypothesis.
The 26 members of the Alphavirus genus of the familyTogaviridae are mosquito-borne viruses that form an impor-tant group of disease agents (1-3). The New World alphavi-ruses include western equine encephalitis virus (WEEV) andeastern equine encephalitis virus (EEEV), both of which arecapable ofcausing encephalitis in humans and causing severedisease in horses. WEEV has a wide geographic distribution,being found from western Canada to Mexico and, discontin-uously, to Argentina. WEEV is transmitted in the westernUnited States by the mosquito Culex tarsalis; birds serve asan important vertebrate reservoir. In the eastern UnitedStates, WEEV is replaced by Highlands J virus (HJV), whoseprimary vector is Culiseta melanura. From serological stud-ies (3, 4) and from limited sequencing studies (5, 6), WEEVand HJV are known to be very closely related, and HJV canbe considered to be a strain of WEEV (2). In the easternUnited States, the range of HJV overlaps that of EEEV,whose primary vector is also Cs. melanura. Other NewWorld alphaviruses include Venezuelan equine encephalitisvirus (VEEV), found in Central and South America; FortMorgan virus, found in Colorado; and Aura virus, found inSouth America.The Old World alphaviruses include Sindbis virus, the
prototype alphavirus; Semliki Forest virus; Chikungunya
virus; O'Nyong-nyong virus; and Ross River virus. Sindbisand Semliki Forest viruses have been intensively studied asmodels for alphavirus replication (7). Sindbis virus is widelydistributed, being found in Europe, India, southeast Asia,Australia, and Africa. Close relatives of this virus, such asOckelbo virus in Europe (8) and Babanki virus in Africa,cause disease in humans characterized by fever, rash, andarthritis. Chikungunya and O'Nyong-nyong viruses havecaused large epidemics in Africa of a dengue-like disease alsocharacterized by fever, rash, and arthralgia. Ross River virusis the causative agent of epidemic polyarthritis in Australiaand the South Pacific.Complete or partial RNA sequences have been obtained for
Sindbis virus (9), Semliki Forest virus (10-12), Ross Rivervirus (13), EEEV (14), and VEEV (15). Comparison of thesenucleotide sequences and their encoded amino acid sequenceshas demonstrated that the alphaviruses are related by lineardescent from a common ancestor (7). The relationships foundare compatible, for the most part, with those derived fromstudies of serological cross-reactivity, which depends onlyupon antigenic epitopes in the structural proteins. In serolog-ical studies, however, WEEV has always been something of apuzzle. It is a New World virus that often causes encephalitis,but serologically it is most closely related to Sindbis virus, anOld World alphavirus not normally associated with encepha-litis. To explore the relationship of WEEV to other alphavi-ruses, we have obtained the sequence of the 3'-terminal 4288nucleotides of the WEEV genomet and found that WEEVappears to have arisen by recombination between an EEEV-like virus and a Sindbis-like virus.
MATERIALS AND METHODSVirus RNA Preparation. WEEV RNA [strain BFS1703,
isolated from Cx. tarsalis in July 1953 in Kern County,California (16)] was obtained from Mark Stanley and JamesHardy (University of California, Berkeley). The virus hadbeen passed twice by i.c. inoculation of suckling mice andfour times (including three plaque isolations) in VERO cells.For RNA preparation, virus grown in VERO cells waspurified by pelleting onto a 30o sucrose cushion followed byisopycnic banding in Nycodenz (Nyegaard, Oslo). Afterpelleting and dissociation in NaDodSO4, the RNA wasextracted by phenol/chloroform treatment, precipitated withethanol, purified on a discontinuous sucrose gradient, andconcentrated by ethanol precipitation.
Cloning and Sequencing. Clones containing the 3'-terminal4288 nucleotides of WEEV RNA were obtained by using anoligo(dT)-tailed vector as a primer as described (17). Clones
Abbreviations: WEEV, western encephalitis virus; EEEV, easternencephalitis virus; VEEV, Venezuelan equine encephalitis virus;HJV, Highlands J virus.*Present address: Israel Institute for Biological Research, P.O. Box19, Ness-Ziona, 70450, Israel.tTo whom reprint requests should be addressed.MThe sequence reported in this paper is being deposited in theEMBL/GenBank data base (IntelliGenetics, Mountain View, CA,and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03854).
5997
The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Dow
nloa
ded
by g
uest
on
May
26,
202
1
5998 Evolution: Hahn et al.
were sequenced by using the chemical sequencing method(18, 19).
RESULTSPartial Sequence of WEEV RNA. The translated sequence
of the 3-terminal 4170 nucleotides of the WEEV genome isshown in Fig. 1. This sequence begins in the region encodingthe carboxyl terminus of nonstructural protein 4, continuesthrough the junction region between the nonstructural andstructural proteins containing the start of the subgenomicmRNA that is translated to give the structural proteins (20),and progresses through the coding sequence of the threestructural proteins of the virus (a nucleocapsid protein, C,and two envelope glycoproteins, E2 and El) and finally
Proc. Natl. Acad. Sci. USA 85 (1988)
through the 3'-terminal untranslated sequence, which ends ina poly(A) tract.We have previously sequenced the amino termini of the
three structural proteins of the McMillan strain of WEEV(isolated in 1941 in Canada from the brain of a fatal humancase) and thus established the start points of the structuralproteins (21). Comparison of the amino acid sequence of theMcMillan strain with that deduced here for the BFS1703strain (isolated from mosquitos in 1953 in California) revealsfour amino acid differences in 142 amino acids for whichcomparison is possible (one in C, one in E2, and two in El).However, reevaluation of the original data for the McMillanstrain suggests that the apparent difference in the capsidproteins may result from a misscall in the McMillan sequenceand that there are no differences between the capsid proteins
I S R Y E I I L A G L I I T S L S T L A E S V K N F K S I R G N P I T L Y G *UCCAGAUACGAGAUCAUACUGGCAGGCCUGAUCAUCACGUCCCUGUCCACGUUAGCCGAFAGCGUUAFGAACUUCAVGAGCAUAPGAGGGAACCCAVUCACCCUCUACGGCUGACCUAA
__ __. P -lF P Y _P O _L _N F P_ P_ V MY _P T N P M A Y R D P N P P RAUAGGUGACGUAGUAGACACGCACCUACCCACCGCCAA*UG-UUU-CCA-UAC-CCUC-AG-CUG-AAC'UUU-CCA-CCA-GUU-UAC-CCUA-CA-AAU-CCG-AUG-GCUUACCGAGAUCCAAACCCUCCUAGGC R W R P F R P P L A A O I E D L R R S I A N L T F K G R S P N P P P G P P P KUGCCGCUGGAGGCCGUUUCGGCCCCCGCUGGCUGCUCAAAUCGAAGAUCUUAGGAGGUCGAUAGCCAACUUAACUUUCAAACAACGAUCACCUAAUCCGCCGCCAGGUCCACCGCCAAAGK K K S A P K P K P T O P K K K K GO A K K T K R K P K P G K R 0 R M C M K L EAAGAAGAAGAGUGCUCCUAAGCCAAAACCUACUCAGCCUA^AAAGAAGAAGCAGCAAGCCAAGAAGACGAAACGCAAGCCUAAACCAGGGAAACGACAGCGUAUGUGUAUGAAGUUGGAGS D K T F P I M L N G O V N G Y A C V V G G R L M K P L H V E G K I D N E 0 L AUCGGACAAGACAUUUCCGAUCAUGCUGAACGGCCAAGUGAAUGGAUACGCUUGCGUUGUCGGAGGAAGGCUGAUGAAACCACUCCACGUUGAAGGAAAAAUCGAUAAUGAGCAAUUAGCGA V K L K K A S M Y D L E Y G D V P 0 N M K S D T L 0 Y T S D K P P G F Y N W HGCCGUG^AAUUGAAGAAGGCUAGCAUGUACGACUUGGAGUAUGGCGACGUUCCCCAGAAUAUGAAAUCAGACACGCUGCAGUACACCAGCGACAAACCACCGGGCUUUUACAACUGGCACH G A V 0 Y E N G R F T V P R G V G G K G D S G R P I L D N R G R V V A I V L GCACGGCGCAGUCCAGUAUGAGAAUGGGAGAUUCACCGUACCGAGAGGAGUGGGCGGGAAAGGCGACAGUGGAAGACCGAUCCUGGAC AAC.AGAGGC>GAVUUGUGGCUAUUGUUCUAGGAG A N E G T R T A L S V V T W N 0 K G V T I K D T P E G S E P W LV T A L C VGGUGCAAACGAGGGCACGCGUACGGCGCUYUCAGUGGUCACUUGGAACCAGAAAGGGGUGACCAUCAAGGAUACCCCCGAAGGUUCUGAACCGUG4UCACUAGUUACAGCGCUGUGCGUGL S N V T F P C D K P P V C Y S L A P E R T L D V L E E N V D N P N Y D T L L ECUUUCGAAUGUCACGUUCCCAUGCGACAAACCACCCGUPCUAUUCACUGGCGCCAGAACGAACACUCGACGUGCUCGAAGAGAACGUCGACAAUCCAAAUUACGACACGCUGCUGGAG
N V L K C P S R R P K R S"TfT D D F T L T S P Y L G F C P Y C R H S A P C F S PAACGUCUUGAAAUGUCCAUCACGCCGGCCCAAACGA GCAACCSAUGACUUCACACUSACCAGUCCCUACCUGGGGUUCUGCCCGUAUUGCAGACACUCAGCSCCGUGUUUCAGCCCA
I K I E N V W D E S D D G S I R I O V S A G F G Y N G A G T A D V T K F R Y M SAUAAAAAUUGAGAACGUGUGGGACGAAUCUGAUGAUGGAUCGAUUAGAAUC CAGGUCUCGGCACAAUUCGGCUACAAUCAGGCAGGCACUGCAGAUGUC ACCAAAUUCCGUUACAUGUCU
F D H D H D I K E D S M D K I A I S T S 6 P C R R L G H K G Y F L L A O C P P GUUCGACCACGACCAUGACAUCAAGGAAGACAGUAUGGAUA^AAUAGCUAUUAGCACAUCUGGACCCUGCCGUCGUCUUGGCCACAAAGGGUACUUCCUGUUAGCUCAAUGUCCUCCAGGUD S V T V S I T S G A S E N S C T V E K K I R R K F V G R E E Y L F P P V H G KGACAGUGUAACCGUCAGUAUCACGAGCGGAGCAUCUGAGAAUUCAUGC ACCGUGGAGA^AAAGAUC AGGAGGAAGUUUGUCGGUAGAGAGGAGUACUUGUUCCCACCUGUCCAUGGA^AAL V K C H V Y D H L K E T S A G Y I T M H R P G P H A Y K S Y L E E A S G E V YCUGGU^AAAUGCCACGUUUACGAUCACUUGAAGGAGACGUC UGCCGGAUACAUAACCAUG.CACAGGCCAGGCCCACACGCGUAUAAGUCCUAUCUGGAGGAAGCGUCAGGCGAAGUGUACI K P P S G K N V T Y E C K C G D Y S T G I V S T R T K M N G C T K A K G C I AAUUAA^ACCACCUUCUGGCAA^GAACGUCACC.UACGA^UGUAA^GUGUGGCGACUACAGCACAGGUAUUGUGAGCACGCGAACGAAGAUGAAC.GGCUGCACUAA^AGCA^AACAGUGCAUUGCCY K S D 0 T K W V F N S P D L I R H T D H S V 0 G K L H I P F R L T P T V C P VUACAAGAGCGACCAA^ACGAAAUGGGUCUUCAACUCGCCGGAUCUUAUUAGGCACACAGACCACUCAGUGCAAGGUA AACUGCACAUUCCAUUCCGCUUGACACCGACAGUCUGCCCGGUU
P L A H T P T V T K W F K G I T L H L T A T R P T L L T T R K L G L R A D A T ACCGUUAGCUCACACGCCUACAGUCACGAAGUGGUUC AAAGGCAUCACCCUCCACCUGACUGCAACGCGACCAACAUUGCUGACAACGAGAsAAUUGGGGCUGCGAGCAGACGCAACAGCAE N I T G T T S R N F S V G R E G L E Y V W G N H E P V R V W A G E S A P G D PGAA^UGGAUUACAGGGACU.ACAUCCAGGAAUUUUUCUGUGGGGCGAGAAGGGCUGGAGUAC.GUAUGGGGCAACCAUGAACCAGUCAGAGUC.UGGGCCCAGGAGUCGGCACCAGGCGACCCA
H G W P H E I I I H Y Y H R H P V Y T V I V L C G V A L A I L V G T A S S A A CCAUGGAUGGCCGCAUGAGAUCAUCAUCCACUAUUUCAUCGGCAUCCAGUCUACACUGUCAUUGUGCUGUGUGGUGUCGCUCUUGCUAUCCUGGUAGGCACUGCAUCJU5ASGCAGCUUGUI A K A R R D C L T P Y A L A P N A T V P T A L A V L C C I R P T N A E W F G EAUCGCCAAAGCAAGAAGAGACUGCCUGAC GCCAUACGCGCUUGCACCGAACGCAACGGUACCCACAGCAUUAGCAGUUUUGUGCUGUAUU. CGGCCAACCAACOC~. AAACAUUUGGAGAA
T L N H L W F N N O P F L W A G L C I P L A A L V I L F R C F S CC M P F L L VACUUUGAACCAUCUGUGGUUUAACAACCACC~tUCUCUGGGCACAGUUGUGCAUCCCUCUGGCAGCGCUUGUUAUUCUGUUCCGCUGCUUUUCAUGCUGCAUGCCUUUUUUAUUGGUUA G V C L G K V D A H A T T V P N V P G I P Y K A L V E R A G Y A P L N L EGCAGGCGUCUGCCUGGGGAAGGUAGACGC UCGAACAUGCGACCACUGUGCCAAAUGUUCCGGGGAUCCCGUAUAAGGCGUUGGUCGAACGUGCAGGUUACGCGCCACUUAAUCUGGAGI T V V S S E L T P S T N K E Y V T C R F H T V I P S P G V K C C G S L E C K AAUCACUGUCGUCUCAUCGGAAUUAACACCC.UCAACUAACAAGGAGUACGUGACCUGCAGAUUCCACACAGUCAUUCCUUCACCACAAGUU.A AAUGCUGCGGGUCCCUCGAGUGUAAGGCAS S K A D Y T C R V F G G V Y P F M W G G A 0 C F C D S E N T 0 L S E A Y V E FUCCUCA AAAGCGGAUUACACAUGCCGCGUUUUUGGCGGUGUGUACCCUUUCAUGUGGGG4 GGCGCACAGUGCUUCUGUGACAGUGAGAACACACAACUGAGUGAGGCAUACGUCGAGUUC
A P D C T I D H A V A L K V H T A A L K V G L R I V Y G N T T A H L D T F V N GGCUCCAGACUGCACUAUUGAUCAUGCAGU CGCACUA^AAGUUCACACAGCUGCUCUGAAAGUCGGCCUGCGUAUAGUAUACGGCAAUACC.ACCGCGCACCUGGAUACGUUCGUCAAUGGCV T P G S S R D L K V I A G P I S A A F S P F D H K V V I R K G L V Y N Y D F PGUCACGCCAGGUUCCUCACGGGACCUGAAQGUCAUAGCAGGGCCGAUAUCAGCCGCUUUUUCACCCUUUGACCAUAAGGUCGUCAUCAGAAAGGGGCUUGUUUACAACUACGACUUCCCUE Y G A M K P G A F G D I G A S S L D A T D I V A R T O I R L L K P S V K N I HGAAUAUGGAGCUAUGAAACCAGGAGCGUUC.GGCGAUAUUCAAGCAUCCUCGCUUGAUGCCACAGACAUAGUAGCCCGCACUGACAUACGG.CUGCUGAAGCCUUCUGUCAAGAACAUCCACV P Y T O A V S G Y E M W K N N S G R P L O E T A P F G C K I E V E P L R A S NGUCCCCUACACCCAAGCAGUAUCAGGGUAUGAAAUGUG6AAGAACAACUCAGGACGACCC.CUGCAAGAAACAGCACCAUUUGGAUGUAAAAUUGAAGUGGAGCCUCUGCGAGCGUCUAACC A Y G H I P I S I D I P 0 A A F V R S S E S P T I L E V S C T V A D C I Y S AUGUGCUUACGGGCACAUCCCUAUCUCGAUU.GACAUCCCUGACGCAGCUUUUGUGAGAUCAUCAGAA^UCACCAACAAUUUUAGAAGUUAGC.UGCACAGUAGCAGACUGCAUUUAUUCUGCA
D F G G S L T L 0 Y K A D R E G H C P V H S H S T T A V L K E A T T H V T A V SGACUUUGGUGGUUCGCUAACAUUACAGUAC.AA^AGCUGACAGGGAGGGACAUUGUCCAGUUCACUCCCACUCCACAACAGCUGUUUUGAAG.G AAGCGACCACACAUGUGACUGCCGUAGGC
S I T L H F S T S S P G A N F I V S L C G K K T T C N A E C K P P A O H I I G EAGCAUAACACUACAUUUUAGCACAUCGAGC.CCACAGCAAAUUUUAUAGUUUCGCUAUGC.GGCAAGAAGACCACCUGCAAUGCUGAAUGU^AACCACCGGCCGACCACAUAAUUGGAGAAP H K V D O E F G A A V S K T S W N W L L A L F G S A S S L I V V G L I V L V CCCACAUAAAGUCGACCAAGAAUUCCAGGCG.GCAGUUUCCAAAACAUCUUGGAAUUGGCUG.CUUGCACUGUUUGGGGGAGCAUCAUCCCUC.AUUGUUGUAGGACUUAUAGUGUUGGUCUGCS S M L I N T R R*AGCCUCACUUAUAAACACACGUAGAUGUGUGAACCACACUGACAUAGCGGUAAACUCGAUUACUUCCGAGGAGCGUGGUGCAUAACGCCACGCGCCGCUUGACACUAPACUCGAUGUAUUUCCGAGGAAGCACAGUGCAUAAUGCUGUGCAGUGUCACAUUAAUCGUAUAUCACACUACAUAUUAAC AACACUAUAUCAC9UUU^UAUAGACUCACUAUGGGUCUCUAAUAUACACUACACAUAUUUUACUUA^AAACACUAUACACACUUUAUAAAUUCUUUUAUAAUUUUyUCUUUUGUUUUUAUUUUGUUUUUA^AAUUUC-POLY(A)
FIG. 1. Sequence of the 3'-terminal 4170 nucleotides of WEEV RNA (strain BFS1703). The start points of the structural proteins are
indicated. Asterisks indicate the termination codons for the nonstructural and structural polyproteins. Two independent clones were sequencedand only one clonal difference was found: the GAC encoding Asp-72 of E2 was replaced by a UAC encoding tyrosine in the second clone.
120
28240
a8380
108480
148600
188
720
228840
268960
3081080
3481200
3881320
4281440
4681560
5081680
5481800
5881920
6282040
2180
7082280
7482400
7882520
8282840
668
2760
9082880
9483000
988
3120
10283240
10683360
11083480
11483600
1166
3720
12283840
39860
4080
39119
27239
87359
107479
147599
187719
227839
287959
3071079
3471199
3871319
4271439
4671559
5071679
5471799
5871919
6272039
6672159
7072279
7472399
7872519
8272839
8672759
9072879
9472999
9873119
10273239
10673359
11073479
11473599
11873719
12273839
3959
4079
4170
Dow
nloa
ded
by g
uest
on
May
26,
202
1
Evolution: Hahn et al.
in the residues compared. The amino acid sequence diver-gence between the two strains is 2.8% (or 2.1% when theapparent difference in the capsid proteins is discounted). Wealso have reported the sequence of the 3'-terminal 351nucleotides of McMillan RNA (6). Comparison with that forBFS1703 shows three nucleotide substitutions and one de-letion (in McMillan) between these two strains, a divergenceof 1.1%. These comparisons establish the fact that the widelystudied McMillan strain (the prototype WEEV virus) and theBFS1703 strain are the same virus. Since these two strainswere isolated 12 years apart in different geographic areas, therate of divergence ofWEEV in nature is at most 0.1-0.2% peryear, which is low in comparison to rates that have beenestablished for several RNA viruses (22, 23).WEEV Is a Recombinant. The amino acid sequences of the
WEEV structural proteins are compared to those of EEEVand of Sindbis virus in Fig. 2. Inspection of this figure clearlyreveals that the WEEV capsid protein C is most closelyrelated to that of EEEV, whereas the glycoproteins E2 andEl are more closely related to the corresponding proteins ofSindbis virus.The relationships among the proteins of these viruses are
summarized in Table 1. The amino-terminal and carboxyl-terminal domains of the capsid protein are considered sepa-rately because of the fact that the carboxyl termini of allalphavirus capsid proteins are closely related. The capsidproteins ofWEEV and EEEV are much more closely related(85% sequence identity) than are those ofWEEV and Sindbisvirus (53% identity). The relationships are reversed in thecase of the envelope proteins. The envelope proteins ofWEEV and Sindbis virus are much more closely related (71%identity overall) than are those of WEEV and EEEV (46%identity). Figures for the carboxyl-terminal domain of nsP4are also included. Although this protein is highly conservedamong alphaviruses, its carboxyl-terminal domain is morevariable, and WEEV and EEV are much more closely relatedin this region than are WEEV and Sindbis virus.Also included in Table 1 are comparisons with another
alphavirus, VEEV, to illustrate that alphaviruses in general
Proc. NatI. Acad. Sci. USA 85 (1988) 5999
Table 1. Percent sequence identity among WEEV, EEEV,VEEV, and Sindbis virus (SINV) proteins
WEEV WEEV EEEV EEEV SINV WEEV
EEEV SINV SINV VEEV VEEV VEEVnsP4 (C
terminus) 70 35 40CapsidN terminus* 78 39 36 42 27 49C terminus* 91 69 64 76 61 77Overall 85 53 50 59 44 63
EnvelopeE3 50 58 42 59 49 56E2 44 68 42 46 40 416K 44 67 45 54 40 40El 49 76 51 58 51 50Overall 47 71 46 53 46 46
*N terminus refers to amino acids 1-132 of the Sindbis capsid proteinor the corresponding positions in the aligned files in Fig. 2. Cterminus includes the remaining amino acids in the capsid proteinsin the aligned files. Unusually high identity values are shown inboldface type.
differ from one another in a uniform and consistent way.Sequence data for Semliki Forest virus or Ross River viruslead to similar results (not shown). WEEV is exceptional inthat it is closely related to Sindbis virus in the region of thegenome encoding El and E2, but to EEEV in other regions.
Nucleotide sequences in the carboxyl-terminal region ofnsP4 and in the junction region between structural andnonstructurial proteins, which are believed to contain impor-tant regulatory elements for transcription of a subgenomicmRNA translated to produce the structural proteins (20), arecompared for the three viruses in Fig. 3a. The EEEV andWEEV nucleotide sequences are very similar to one anotherand the sequences flanking the start of the subgenomic 26SRNA are identical. The sequence of Sindbis virus in thisregion is similar but not identical. The nsP4 proteins ofEEEV
CWEE 11) M4FPYPOLNFPPVYPTNPM4AYRDPNPPRCRWR PFRPPLAAOIEDLRRSIANLTFKG RSPNPPPGFPPPKKKKSAPKPKPTGPKKKKGG AKKTKRKPKPGKRGRMCMKLESDKTFPIlM,,EEE (11 ..... T.V.Y.HMA.I ............GVA............L A ...A. 6.3.A.R.P SLSLETKE 3333 0SIN (1) .NRGFFNMLGRRPFPA.T.MWR.RRR.GAAPM.A.NG-..S G.TTAVSA.VIG.AT.PGP.R.R .3.R.. GF KPK.P.T EKK.GPA...... AL..A.RL.D',KN
WEE (117) NGNYCVGLKLVG~DELAKKAMDEGVGMSTGTOPGYWHAGEGFVRVGGSAIORRVILGNGREEE (117)......VF....R.....I.... .......C...............N......... .......K.... .0Q .V -S..SIN 1121) E0.D.I.H.LAME-KvV... .K.T. .HPV.SKL.FT 5.6A .M.FAGL.V .R.EAFT .EHi E -----SG R .M 5. ....
I III IItIWEE (236) ALSVVTWNGKGVTIK0TPEGSEPW S LVTALCVLS;NVTFPCDKPPV CYSLAPERTLDVLEENVDNPNYOTLLENVLKCP SRRPPER SITOOFT LTSPYLGFCPYCRHSEEE (236) ........V .M.A.I.... G.CMPC. .EKN.HE. .TM4..3.Y.SRA..3.DAAV .1N A..TR DLOTH OVE.A..AP.19..NOGSIN (241).....S. K. .7. .....T.E.AP .L.G..S.. P.R .T ..TRE.S.A. I...... NHEA. ..... NAI.R.S.GS.S VI ... 7.... .TS V1.
1T 1'WEE (2362) ACSIINWEDGIIVAFYGGAVKFMFHHIESDIITGCRGKYLACPDVVISAESTEKRKVRELEEE (26E2) PD.O.. A.9.E.PGDAHA.V .... .T. .M4.LKPH. V.LAYI4MF.NGKTOKS. .I NLHVR. .A. .SLVS.H..Y.1.....T.. GFHD.PNRHT.RILAH.VEFAP. ..KRSIN (2362) E....V... .... .A..NT....T.....D.S.A.SAN.YK ... LEO..TV. .GT..D.K......SY. ...... K ...... V.SN.AT. ..LAR. .KP ..... K.DL
IIWEEEEESIN
WEE666SIN
WEEEEESIN
(1412) .6..VELP.NR.T.KRADOGH.VE. .0..LVGDH.L.SIH.AK-K.TV. .. A.0KVY ...P.VPE. .T. SDHTTT. .0V .. LI.NNE..V.Y.0G3P.GEGOTFKE.. V..VPV(443621.----K.7 ... ....T..7.P...R.... T.7.S. K6. A.. .1......K..T!..EIT. ---.AI.. .V .0......... T9.A ... L..K:
(26062) EAK.IAT. .PE.L.EHKHPTLI. ..HPDH.S...S.S0J.WP~jR3. ERP.TV-T.7G.7. 7..T... .P.E.S...E N ..... VVVVV. .N.Y.LT1.l.G..TCVAI.M.SC(26362) S.57.1..V. .A.N.IHG. .H.S.Q.0T0HL .P--- .. ANPEP-T. .. .V.K.V..T.7.0.0...I.1..... y.... ...VG ..... .. ILAVASATV MMI.V
I ~~~~~~6KrIE I(38362) ASSAACIAKARROCLTPYALAPIAT:VPTALAVLCCIRPPNA ETFGETLNHLWFNNOPFLWAGLCIPLAALVILFFRCF SCCMP FLLVAGVCLGEVDA FEHATTVPNVPGIPYK(38062) OHPCGSFSGL.NL.I. K 3 L L..... OTLO]V. Y.N.1... N.F.M.TL....IVCM.MLAALF..G.A.... C.AW AA. .,TAVM..KV...(38362) TVAVL.AC...T......VI-.S..L.. V.SA. 715..T. ..S.S... F.V.....FIV.M. .C ..L. .V ... AY.9 Y.y..... ..0I ~ ~~~~I iII
WEE (1761) AVRGALLIVSETSNEVCFTISOKCSEKSKDTRFGYFWGGFDETLEYEADTDAAKHA~VLR.V666 (1761) .. .P.... VH.13.3L.NTRII .... .L..I..KYK.KV. .. .V.. .ATO.TSKPHP. .3.G..T ....... . 7..T..M..14 .. .356E.5 KY ... GTVGAM4VN.TSIN (1761) ..M .VL. ... .3.1. .E...V. .. ...... GPAAM ..K..5... 14 LSA..AS. 0 1 ...14....I( IIWEE666SIN
WEE666SIN
(13761) YGFJT7AHL DFNVPSRLVAPSASFHVDKLYYFEGMPAGIASOTIATILKSKIVYGVGEWNSGPGT(13761) .SV.'WPSA.VY. .. E.. AKIG.A L I. .L.S.W. ... .1.. VYGHE......TAS..L.SRTSTSN.LY.N.NLK.GR.3AGIV.7.F ..P..F3 .ROE. A..NO-V(13761)..SF.7V K.......S.T ......HR T. .TSE.LI.S......A..V ....S.
(25761) ...S.AL.... .P6...V.S ........ 7.1. .7..VS0LE.KITE.T.AS. .... IA. .PTNPVKETEV3FIV.OVLOL. .RM.SPLLRA. .F.F ...ANIHPA.KLGV.TSGI.(25661)....A.N.... VDOS.N...... N... .I.T.DA.LVST..E..SE.T.....MA...-VS .... a.0.S...S.7..S.V. .LEE.AV.V .AA........
WEE (37661) CNAECKPPAOHlIISEPHEVD3EFOAAVSETSWNWLLALFGGASSLIVVGLIVL VCSSM4LINTRR *666 )377E1( KGO ....E...VDV.A3HTES.TS-I.A.A.S..KV.V..T.AF..L ...ATA.VALV.FFH.H*SIN (3766E1) ........57.N..1...I...S. .F.....LII..MlF A. .1..75..--*
(11601
11 C)11200)
(235SC)123501,
124001
)259C, 60, 22621
)259C, 63, 25621
(264GC 64, 22621
1142E2)1140621
1142621
1262 E2)(25962)1262621
1382E2)1379621
1382621
142362, 55, 166 1)
1,42062, 56, 166 1,
142362, 55, 16E 1
1136611
1136611
1136611
12556E1112566E111250611
1375611
137661413756 11
(4396 11
1,441E61114396 11
FIG. 2. Comparison of the amino acid sequences of the structural proteins of WEEV (WEE), EEEV (EEE), and Sindbis virus (SIN). Adot in the EEE or SIN sequence means that the amino acid is the same as that ofWEE on the first line. Gaps have been introduced for alignment.Potential glycosylation sites are boxed and cysteines are highlighted with dotted overlay.
Dow
nloa
ded
by g
uest
on
May
26,
202
1
Proc. Natl. Acad. Sci. USA 85 (1988)
aWEE I R G N P I T L Y G * - 26S
CAUAAGAGGGAACCCAAUCACCCUCUACGGCUGACCUAAIAUAGGU
EEE I R G H P I T L Y G a
/,, itrmtao rlsoo7% riAvafttorTA tt-TI JAoroorT~V r-Tl~vpr
SIN
CAUAAC-AGbbU WA(;(C,(AUAA(;;(U( UA UNPLG :UC-A((UAAAUAC7U
I R G E I K H L Y G G P K .CAUCAGAGGGGAAAUAAAGCAUCUCUACGGUGGUCCUAAUAGUC*** ***** * ** ******** * *********
bWEE UAAUUUUUCUUUU GUUUUUAUUUUGUUUUUAAAAUUUC poly (A)
EEE UAAUUUUUCUUUUAUGUUUUUAUUUUGUUUUUAAUAUUUC poly (A)************* ******************* *****
SIN UUUCUUUUAUUAAUCAACAAAAUUUUGUUUUUAACAUUUC poly (A)* **** ** ************* *****
FIG. 3. Comparison of the nucleotide sequences in the junctionregions of EEEV (EEE), WEEV (WEE), and Sindbis virus (SIN) (a)or at the 3' end of the RNAs (b). Asterisks denote conservednucleotides. The heavy underlines denote conserved nucleotidesequences in the alphaviruses that are believed to form importantregulatory elements for RNA transcription (7). The terminationcodons that end the nonstructural open reading frames are markedwith black circles.
and WEEV terminate at the same residue, whereas theSindbis virus protein terminates downstream.The sequences at the 3' termini of WEEV, EEEV, and
Sindbis virus are shown in Fig. 3b. The 3'-terminal 19nucleotides have been proposed to form an important ele-ment in Alphavirus RNA replication because they are highlyconserved among thembers of this genus (6), and this se-quence element (underlined in Fig. 3b) is invariant amongthese three viruses with the exception of the sixth nucleotidefrom the end. The nucleotides upstream of this are A/U richand not particularly conserved among alphaviruses, but inthis domain the sequences of WEEV and EEEV are almostidentical, whereas that of Sindbis virus is more variable.These results show that within the region examined, the
WEEV nucleotide sequence is recombinant, with both the 5'and 3' ends derived from an EEEV-like virus and the inter-vening glycoprotein genes derived from a Sindbis-like virus.We presume that the 5'-terminal two-thirds of the genome,which has not yet been sequenced, is also derived from theEEEV-like, virus. Partial support for this comes from ourprevious finding that the 5' terminal sequence of HJV issimilar to that of EEEV (5).The Recombination Events. Our interpretation of the se-
quence information is shown schematically in Fig. 4, whichis included in part to illustrate the structure of the alphavirusgenome. In this model, close inspection of the aligned se-
quences in Fig. 2 suggests that the 5' crossover occurred inE3. Gaps must be introduced into the amino acid sequencesto align them, and the two gaps of three amino acids each inE3 are of particular interest. The first gap, following residue1, is shared by WEEV and EEEV; upstream of this WEEVand EEEV are in almost perfect alignment (only one gap of
nsPl nsP2 nsP3 nsP4 C E3 E2 6K El
q I H 11 11 > SIN
x x
{S~~~~~~~1 EEE
_IMMEMOMMEN", WEE
FIG. 4. Schematic representation of the recombination eventthat produced WEEV (WEE). The crossover points to produce WEEare indicated. SIN, Sindbis virus; EEE, EEEV.
one amino acid must be introduced into each sequence tomaintain alignment), whereas several gaps must be intro-duced to keep the Sindbis virus sequence aligned. Con-versely, the gap following residue 21 ofWEEV E3 is sharedby Sindbis virus and WEEV; downstream of this, the Sindbisvirus and WEEV sequences are in almost perfect register(only one gap of one amino acid is required to maintainalignment), whereas numerous gaps are required to keep theEEEV sequence in register. This suggests that the recombi-nation event occurred between these two gaps in E3, whichis compatible with the sequence similarities exhibited by thecapsid proteins and the glycoproteins in Table 1.The 3' crossover appears to have occurred in the 3' untrans-
lated region. The 60 nucleotides ofWEEV RNA following thestructural protein stop codon are very similar to the Sindbisvirus sequence, whereas the last 80 nucleotides ofthe RNA aresimilar to EEEV, with no sequence similarity detectable inbetween. Although a double crossover seems inherently lesslikely than a single crossover, the presence of importantreplication signals at the 3' end may require such an event toproduce viable (or at least efficiently replicating) virus (6, 7).There is a formal possibility thatWEEV is one ofthe paren-
tal viruses in a cross that resulted in the reciprocal recomi-binants Sindbis virus and EEEV. Because RNA recombina-tion is believed to' occur by a copy-choice mechanism,however, in which reciprocal recombinants are not produced(24), and because ofthe apparent rarity of viable recombinantviruses, this possibility appears remote.
Interaction of the Nucleocapsid and Glycoproteins DuringVirus Budding. Alphaviruses mature when preassembled nu-cleocapsids, which are icosahedral structures consisting of 180copies of the nucleocapsid protein and one molecule of thevirus RNA, acquire an envelope by budding through theplasma membrane (25, 26). The envelope consists of a lipidbilayer derived from the host cell in which are embedded twovirus-encoded glycoproteins, E2 and El. The nucleocapsidand the glycoproteins are thought to interact specifically withone another, so as to exclude nonvirus proteins from thestructure; the free energy for driving virus budding is derivedfrom these specific interactions. During evolution, certaindomains of the glycoproteins of a particular virus must havebeeh selected for maximal specific interaction with the capsidof that virus. In a recombinant virus that contains the capsidprotein from one virus and the glycoproteins from another, theinteractions during budding might not be optimal. Duringpassage of such a recombinant virus, selection pressure wouldfavor variants in which the nucleocapsid and glycoproteininteractions were improved. It is thus of considerable interestthat there are only seven amino acid differences betweenWEEV and EEEV in the carboxyl-terminal 104 amino acids ofthe capsid protein, and for 6 of these WEEV has the Sindbisvirus amino acid (Fig. 2). This suggests that this domain of thecapsid protein interacts with the glycoproteins during virusassembly and that, after the recombination event, selectionhas led to some of the EEEV capsid amino acids beingreplaced with Sindbis virus amino acids to allow more efficientinteraction with the Sindbis virus glycoproteins. Conversely,in the carboxyl-terminal 16 amino acids of E2, there are 6amino acid differences betweenWEEV and Sindbis virus, andfor 4 ofthese WEEV has the EEEV amino acid, suggesting bythe same logic that this domain ofE2 interacts with the capsidduring budding. Other examples can be found in other regionsof the structural proteins. The hypothesis that these domainsare involved in capsid-glycoprotein interactions can be testedby site-specific mutagenesis (27).
DISCUSSIONThe Origin of WEEV. The two parents of WEEV and the
time of the recombination event cannot be determined at thecurrent time. As described earlier, the McMillan strain of
6000 Evolution: Hahn et al.
Dow
nloa
ded
by g
uest
on
May
26,
202
1
Proc. Natl. Acad. Sci. USA 85 (1988) 6001
WEEV isolated in 1941 in Canada and the BFS1703 strainisolated in 1953 in California are clearly strains of the samevirus. They have nearly identical capsid proteins, glycopro-teins E2 and El, and 3'-terminal sequences. Thus, the recom-bination event could not have occurred during passage of thevirus in culture, as this would have required the identicalrecombination event to have occurred twice, in differentlaboratories. By the same logic, the recombination event musthave predated the isolation of the McMillan strain ofWEEVin 1941. Furthermore, all ofthe sequence information obtainedis compatible with the hypothesis that the recombinant virusarose before the separation ofWEEV and HJV. On the otherhand, the amino-terminal portions of the capsid proteins ofWEEV and EEEV are very similar, a lysine- and arginine-richdomain not well conserved among alphaviruses (28). Thus, thesimilarity in the WEEV and EEEV sequences, together withthe fact that RNA viruses diverge rapidly (21), suggests thatthe recombination event must be relatively recent. We pro-pose that one of the parents was EEEV itself. The sequencesimilarities with Sindbis virus in the envelope protein regionsare not as pronounced and suggest that the second parent wasnot Sindbis virus itself but a relative of it. Because WEEV andEEEV are New World viruses, we propose that the recom-bination event occurred in the New World between EEEV oran immediate ancestor of it and a Sindbis-like virus that has yetto be identified. It seems most likely that the recombinationevent took place in the mosquito vector, in which the virus setsup a persistent life-long infection. EEEV and HJV overlap ingeographic ranges and mosquito vector. Thus HJV mightrepresent the ancestral recombinant virus that radiated toproduce WEEV.
Recombination in RNA Virus Evolution. There has beenmuch speculation about the importance of recombination inthe evolution of RNA viruses (29, 30). In segmented RNAviruses, reassortment of individual genome segments duringmixed infection, a form of recombination equivalent to theshuffling of chromosomes in diploid creatures, is readilydemonstrated in cell culture. Reassortment is a major mech-anism for generating new pandemic strains of influenza virus(31, 32), and it may be that the ability to undergo readyrecombination conveys significant selective advantage.Among the nonsegmented RNA viruses, recombination hasbeen in general more difficult to demonstrate, but it has beenshown to occur in the picornaviruses (33, 34), the coronavi-ruses (35, 36), and the bromoviruses (37), although not beforenow in the alphaviruses. In poliovirus, recombination occursby a copy-choice mechanism during RNA replication (24),and it is assumed that all RNA recombination (as opposed toreassortment) occurs by this mechanism. Although wellestablished in principle, evidence for the importance ofrecombination in nature as a mechanism that leads to suc-cessful new strains is limited. In the case of poliovirus,recombination has been shown to occur in vaccinees thathave simultaneously received high doses of three attenuatedviruses (34), but this is not a natural system. The finding thatWEEV, a virus with a wide geographic range, is a naturallyoccurring recombinant lends support to the hypothesis thatRNA recombination is an important force in the evolution ofRNA viruses. In this particular case, it has given rise to a newvirus that combines the disease-causing potential of EEEVwith new antigenic properties from a Sindbis virus-like virus.
We thank Drs. M. Stanley and J. Hardy for the WEEV RNA usedin this project. This work was supported by Grants A120612 andA110793 from the National Institutes of Health and Grant DMB86-17372 from the National Science Foundation.
1. Griffin, D. E. (1986) in The Togaviridae and Flaviviridae, eds.Schlesinger, S. & Schlesinger, M. (Plenum, New York), pp.209-249.
2. Chamberlain, R. W. (1980) in The Togaviruses, ed. Schlesin-ger, R. W. (Academic, New York), pp. 175-227.
3. Calisher, C. H., Shope, R. E., Brandt, W., Casals, J., Kara-batsos, N., Murphey, F. A., Tesh, R. B. & Wiebe, M. E.(1980) Intervirology 14, 229-232.
4. Hayes, C. G. & Wallis, R. C. (1977) Adv. Virus Res. 21, 37-83.5. Ou, J.-H., Strauss, E. G. & Strauss, J. H. (1983) J. Mol. Biol.
168, 1-15.6. Ou, J.-H., Trent, D. W. & Strauss, J. H. (1982) J. Mol. Biol.
156, 719-730.7. Strauss, E. G. & Strauss, J. H. (1986) in The Togaviridae and
Flaviviridae, eds. Schlesinger, S. & Schlesinger, M. (Plenum,New York), pp. 35-90.
8. Niklasson, B., Espmark, A., LeDuc, J. W., Gargan, T. P.,Ennis, W. A., Tesh, R. B. & Main, A. J. (1984) Am. J. Trop.Med. Hyg. 33, 1212-1217.
9. Strauss, E. G., Rice, C. M. & Strauss, J. H. (1984) Virology133, 92-110.
10. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. &Delius, H. (1980) Nature (London) 288, 236-241.
11. Garoff, H., Frischauf, A.-M., Simons, K., Lehrach, H. &Delius, H. (1980) Proc. Nati. Acad. Sci. USA 77, 6376-6380.
12. Takkinen, K. (1986) Nucleic Acids Res. 14, 5667-5682.13. Dalgarno, L., Rice, C. M. & Strauss, J. H. (1983) Virology 129,
170-187.14. Chang, G.-J. J. & Trent, D. W. (1987) J. Gen. Virol. 68, 2129-
2142.15. Kinney, R. M., Johnson, R. J. B., Brown, V. C. & Trent,
D. W. (1986) Virology 152, 400-413.16. Hardy, J. L., Reeves, W. C., Rush, W. A. & Nir, Y. D. (1974)
Infect. Immun. 10, 553-564.17. Lindqvist, B. H., DiSalvo, J., Rice, C. M., Strauss, J. H. &
Strauss, E. G. (1986) Virology 151, 10-20.18. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65,
499-560.19. Smith, D. R. & Calvo, J. M. (1980) Nucleic Acids Res. 8, 2225-
2274.20. Ou, J.-H., Rice, C. M., Dalgarno, L., Strauss, E. G. & Strauss,
J. H. (1982) Proc. Natl. Acad. Sci. USA 79, 5235-5239.21. Bell, J. R., Bond, M. W., Hunkapiller, M. W., Strauss, E. G.,
Strauss, J. H., Yamamoto, K. & Simizu, B. (1983) J. Virology45, 708-714.
22. Steinhauer, D. A. & Holland, J. J. (1987) Annu. Rev. Micro-biol. 41, 409-433.
23. Smith, D. B. & Inglis, S. C. (1987) J. Gen. Virol. 68, 2729-2740.24. Kirkegaard, K. & Baltimore, D. (1986) Cell 47, 433-443.25. Strauss, E. G. & Strauss, J. H. (1985) in Virus Structure and
Assembly, ed. Casjens, S. (Jones & Bartlett, Boston), pp. 205-234.
26. Fuller, S. D. (1987) Cell 48, 923-934.27. Rice, C. M., Levis, R., Strauss, J. H. & Huang, H. V. (1987)
J. Virol. 61, 3809-3819.28. Rice, C. M. & Strauss, J. H. (1981) Proc. Natl. Acad. Sci. USA
78, 2062-2066.29. Strauss, J. H. & Strauss, E. G. (1988) Annu. Rev. Microbiol.
42, 657-683.30. Hodgman, T. C. & Zimmern, D. (1988) in RNA Genetics, eds.
Domingo, E., Holland, J. J. & Ahlquist, P. (CRC Press, BocaRaton, FL), Vol. 3, in press.
31. Desselberger, U., Nakajima, K., Alfino, P., Pederson, F. S.,Haseltine, W. A., Hannoun, C. & Palese, P. (1978) Proc. Natl.Acad. Sci. USA 75, 3341-3345.
32. Webster, R. G., Laver, W. G., Air, G. M. & Schild, G. C.(1982) Nature (London) 296, 115-121.
33. Cooper, P. D. (1977) in Comprehensive Virology, eds.Fraenkel-Conrat, H. & Wagner, R. R. (Plenum, New York),Vol. 9, pp. 133-207.
34. Kew, 0. M. & Nottay, B. K. (1984) in Modern Approaches toVaccines: Molecular and Chemical Basis of Virus Virulence,ed. Channock, R. M. (Cold Spring Harbor Lab., Cold SpringHarbor, NY), pp. 357-362.
35. Lai, M. M. C., Baric, R. S., Makino, S., Keck, J. G., Egbert, J.,Leibowitz, J. L. & Stohlman, S. A. (1985) J. Virol. 56, 449-456.
36. Makino, S., Keck, J. G., Stohlman, S. A. & Lai, M. M. C.(1986) J. Virol. 57, 729-737.
37. Bujarski, J. J. & Kaesberg, P. (1986) Nature (London) 321,528-531.
Evolution: Hahn et al.
Dow
nloa
ded
by g
uest
on
May
26,
202
1
Top Related