The nucleoprotein gene of ebola virus: Cloning, sequencing, and in vitro expression

11
VIROLOGY 170,81-91 (1989) The Nucleoprotein Gene of Ebola Virus: Cloning, Sequencing, and in vitro Expression ANTHONY SANCHEZ,’ MICHAEL P. KILEY, BRIAN P. HOLLOWAY, JOSEPH B. MCCORMICK, AND DAVID D. AUPERIN Division of Viral Diseases, Center for Infectious Diseases, Centers for Disease Control, Atlanta, Georgia 30333 Received September 22, 1988; accepted January 3, 1989 Genomic and messenger RNAs of a Zaire strain of Ebola virus were cloned, and inserts specific for the nucleoprotein gene were isolated and sequenced. The nucleoprotein gene is located proximal to the 3’ end of the genome and is preceeded by a putative leader sequence. The gene begins with the transcriptional start site sequence 3’4JACUCCUU CUAAUU . . . ) and ends with the polyadenylation site sequence 3’-. . . UAAUUCUUUUUU. The predicted coding region is 2217 bases in length and encodes a protein that contains 739 amino acids, with a calculated molecular weight of 83.3 kDa. The protein has an approximate net charge of -30 and can be divided into a hydrophobic N-terminal half and a hydrophilic and highly acidic C-terminal half. An in vitro transcript, generated from plasmid DNA containing the entire coding region, directs the synthesis of authentic nucleoprotein in a rabbit reticulocyte lysate system. The geno- mic organization and transcriptional signals of Ebola are similar to those of other nonsegmented, negative-strand RNA viruses, but nucleic acid or amino acid sequence comparisons indicate a lack of similarity. o 1989 Academic PRESS. I~C. INTRODUCTION Ebola virus, an extremely virulent African hemor- rhagic fever virus, has been isolated from humans fol- lowing outbreaks in central Africa, which occurred from 1976 to 1979 (Bowen et al., 1977; Johnson et al., 1977; Heymann et al., 1980; Baron et al., 1983). Two subtypes were isolated during these outbreaks and are designated as Zaire and Sudan strains. These strains have previously been shown to differ in pathogenicity, antigenicity, and genomic composition (Buchmeier et al., 1983; Cox et al., 1983; McCormick et al., 1983; Richman et a/., 1983; Elliott et al., 1985). Zaire strains are more virulent, with an approximate mortality rate of 90% in humans. The Ebola virion is bacilliform and composed of a he- lical nucleocapsid surrounded by a lipoprotein enve- lope. There are seven structural proteins, a major nu- cleoprotein (NP), a surface glycoprotein (GP), a poly- merase (L), what appears to be a minor nucleoprotein (VP30), and three proteins (VP40, VP35, and VP24), whose functions have not been identified (Elliott et al., 1983; Kiley et al., 1980). The genome is a nonseg- mented RNA strand with an approximate molecular weight of 4 X 10” (Regnery et al., 1981). The negative- stranded nature of the virus was verified by demon- strating that mRNA transcripts are complementary to the vRNA (Sanchez and Kiley, 1987). These transcripts Sequence Data from this article have been deposited with the EMBUGenBank Data Libraries under Accession No. J04337. ’ To whom requests for reprints should be addressed. were shown to be monocistronic and polyadenylated, with the NP gene as the first gene to be transcribed. Because of similarities in morphology, biochemical composition, and genetic organization, Ebola virus and the closely related Marburg virus have been associated with members of the two nonsegmented negative-strand RNA virus families, Paramyxoviridae and Rhabdoviridae. Recent RNA sequence analyses of the Ebola and Mar- burg genomes have shown that their 3’ terminal se- quences are identical for 12 bases (Kiley et al., 1986) but are completely unrelated to corresponding sequences of rhabdoviruses and paramyxoviruses. Due to their unique nature, Ebola and Marburg viruses have been designated charter members of a new family of nonsegmented nega- tive-strand RNAviruses, the Filoviridae. Despite their sim- ilarities, there is no evidence that Ebola and Marburg vi- ruses are antigenically related. To better define the genetic relationship of filoviruses to other nonsegmented negative-strand RNA viruses, we began a program directed at cloning and sequenc- ing the genome of a Zaire strain of Ebola virus (May- inga). In this paper we present cloning and sequencing data on the first 3027 bases from the 3’end of the Ebola Mayinga genome, including a putative leader se- quence and the complete NP gene. In addition, we compare the transcriptional signals of the to those of rhabdoviruses and paramyxoviruses and demonstrate the in vitro expression of the NP coding region. MATERIALS AND METHODS Viruses and cells The Mayinga strain of Ebola, a Zaire subtype (Kiley et al., 1986) was used in molecular cloning and se- 81 0042-6822/89$3.00 Copyright 0 1989 by Academic Press. Inc. All rights of reproduction in any form resewed.

Transcript of The nucleoprotein gene of ebola virus: Cloning, sequencing, and in vitro expression

VIROLOGY 170,81-91 (1989)

The Nucleoprotein Gene of Ebola Virus: Cloning, Sequencing, and in vitro Expression

ANTHONY SANCHEZ,’ MICHAEL P. KILEY, BRIAN P. HOLLOWAY, JOSEPH B. MCCORMICK, AND DAVID D. AUPERIN

Division of Viral Diseases, Center for Infectious Diseases, Centers for Disease Control, Atlanta, Georgia 30333

Received September 22, 1988; accepted January 3, 1989

Genomic and messenger RNAs of a Zaire strain of Ebola virus were cloned, and inserts specific for the nucleoprotein gene were isolated and sequenced. The nucleoprotein gene is located proximal to the 3’ end of the genome and is preceeded by a putative leader sequence. The gene begins with the transcriptional start site sequence 3’4JACUCCUU CUAAUU . . . ) and ends with the polyadenylation site sequence 3’-. . . UAAUUCUUUUUU. The predicted coding region is 2217 bases in length and encodes a protein that contains 739 amino acids, with a calculated molecular weight of 83.3 kDa. The protein has an approximate net charge of -30 and can be divided into a hydrophobic N-terminal half and a hydrophilic and highly acidic C-terminal half. An in vitro transcript, generated from plasmid DNA containing the entire coding region, directs the synthesis of authentic nucleoprotein in a rabbit reticulocyte lysate system. The geno- mic organization and transcriptional signals of Ebola are similar to those of other nonsegmented, negative-strand RNA viruses, but nucleic acid or amino acid sequence comparisons indicate a lack of similarity. o 1989 Academic PRESS. I~C.

INTRODUCTION

Ebola virus, an extremely virulent African hemor- rhagic fever virus, has been isolated from humans fol- lowing outbreaks in central Africa, which occurred from 1976 to 1979 (Bowen et al., 1977; Johnson et al., 1977; Heymann et al., 1980; Baron et al., 1983). Two subtypes were isolated during these outbreaks and are designated as Zaire and Sudan strains. These strains have previously been shown to differ in pathogenicity, antigenicity, and genomic composition (Buchmeier et al., 1983; Cox et al., 1983; McCormick et al., 1983; Richman et a/., 1983; Elliott et al., 1985). Zaire strains are more virulent, with an approximate mortality rate of 90% in humans.

The Ebola virion is bacilliform and composed of a he- lical nucleocapsid surrounded by a lipoprotein enve- lope. There are seven structural proteins, a major nu- cleoprotein (NP), a surface glycoprotein (GP), a poly- merase (L), what appears to be a minor nucleoprotein (VP30), and three proteins (VP40, VP35, and VP24), whose functions have not been identified (Elliott et al., 1983; Kiley et al., 1980). The genome is a nonseg- mented RNA strand with an approximate molecular weight of 4 X 10” (Regnery et al., 1981). The negative- stranded nature of the virus was verified by demon- strating that mRNA transcripts are complementary to the vRNA (Sanchez and Kiley, 1987). These transcripts

Sequence Data from this article have been deposited with the EMBUGenBank Data Libraries under Accession No. J04337.

’ To whom requests for reprints should be addressed.

were shown to be monocistronic and polyadenylated, with the NP gene as the first gene to be transcribed.

Because of similarities in morphology, biochemical composition, and genetic organization, Ebola virus and the closely related Marburg virus have been associated with members of the two nonsegmented negative-strand RNA virus families, Paramyxoviridae and Rhabdoviridae. Recent RNA sequence analyses of the Ebola and Mar- burg genomes have shown that their 3’ terminal se- quences are identical for 12 bases (Kiley et al., 1986) but are completely unrelated to corresponding sequences of rhabdoviruses and paramyxoviruses. Due to their unique nature, Ebola and Marburg viruses have been designated charter members of a new family of nonsegmented nega- tive-strand RNAviruses, the Filoviridae. Despite their sim- ilarities, there is no evidence that Ebola and Marburg vi- ruses are antigenically related.

To better define the genetic relationship of filoviruses to other nonsegmented negative-strand RNA viruses, we began a program directed at cloning and sequenc- ing the genome of a Zaire strain of Ebola virus (May- inga). In this paper we present cloning and sequencing data on the first 3027 bases from the 3’end of the Ebola Mayinga genome, including a putative leader se- quence and the complete NP gene. In addition, we compare the transcriptional signals of the to those of rhabdoviruses and paramyxoviruses and demonstrate the in vitro expression of the NP coding region.

MATERIALS AND METHODS Viruses and cells

The Mayinga strain of Ebola, a Zaire subtype (Kiley et al., 1986) was used in molecular cloning and se-

81 0042-6822/89$3.00 Copyright 0 1989 by Academic Press. Inc. All rights of reproduction in any form resewed.

82 SANCHEZ ET AL.

quencing studies. This virus was plaque-purified three times and passaged two to four times thereafter, The virus was cultured in E6 cells, a cloned line of Vero cells (ATCC CRL 1586) as previously described (Elliott eta/., 1985; Sanchez and Kiley, 1987).

RNA extraction, cDNA synthesis, and molecular cloning

Ebola Mayinga viral RNA (vRNA) and mRNA were pu- rified (Sanchez and Kiley, 1987) and used as templates in the synthesis of cDNA. Three strategies of cDNA synthesis were employed using conventional molecu- lar biological techniques. Initial cDNA preparations were obtained from NP mRNA enriched by sucrose gradient fractionation of poly(A)-selected infected cell RNA (Maniatis et al., 1982; Sanchez and Kiley, 1987). First-strand cDNA was produced by priming with 5’ phosphorylated oligothymidylic acid preparation, oligo- pd(T),,-,, (Pharmacia), extending with AMV reverse transcriptase (RT; Life Sciences, Inc.). Second-strand synthesis via hairpin priming was performed as pre- viously described (Auperin et a/., 1986). Synthesis of cDNA from vRNA template took two approaches. First, vRNA was poly(A)-tailed at the 3’ end with Escherichia co/i poly(A) polymerase to protect and later identify the 3’ end of the genome. The reaction was performed in a 30-~1 reaction composed of 0.5 unit enzyme, 50 mll/l Tris-HCI (pH 7.9) 10 mM MgC&, 10 mM MnCI,, 250 mll/l NaCI, 500 pg/ml BSA, and 1 rnM ATP. The reac- tion was carried out for a period of 5 min at 37”, then stopped by the addition of 1 ~1 500 mh/l EDTA, RNA extracted and precipitated, and first-strand synthesis primed with oligo-pd(T) and extended with RT as be- fore. The RNA template was digested from the DNA- RNA duplex by resuspending the precipitated duplex in 50 mMTris-HCI, pH 8.3, then adding 0.3 M NaOH to a final concentration of 0.1 M and heating at 65” for 20 min. The solution was then neutralized with 0.3 M HCI and the DNA precipitated. The DNA was then poly(A)- tailed at the 3’ end in a 25-~1 reaction composed of 9 units terminal deoxynucleotidyl transferase, 100 mll/l Na cacodylate (pH 7.2), 10 mNI MgC12, 10 mlVl DlT, and 1 mn/l dATP. The reaction was carried out for 5 min at 37’, then the DNA was extracted and precipitated. Second-strand synthesis was primed with oligo-pd(T) and extended as before.

To derive clones more evenly spread across the ge- nome, a random priming method was used for first- strand synthesis. First-strand cDNA synthesis was primed off vRNA templates with an oligo-pd(N), random primer (Pharmacia) using RT. RNA-DNA hybrids were treated with RNase H, DNA polymerase I (Klenow frag-

ment), and E. co/i DNA ligase, essentially as described by Gubler and Hoffman (1983) to produce double- stranded cDNA. Subsequent synthesis of cDNA from poly(A)-selected Ebola Mayinga mRNA template fol- lowed the same oligo(dT) priming for first-strand syn- thesis, but second-strand synthesis employed the RNase H method described above.

All preparations of double-stranded cDNA were di- gested with Sl nuclease and repaired with the Klenow fragment of DNA polymerase I. The blunt-end cDNA was ligated into the Smal site of the plasmid pUC18 and cloned, and viral-specific clones were identified as previously described (Grunstein and Hogness, 1975; Maniatis eta/., 1982). Plasmid DNA was isolated by the method of Ish-Horowitz and Burke (1981) and stored frozen in water.

Nucleotide sequence analysis

DNA sequences were determined by the method of Maxam and Gilbert (1980), as modified by Bishop et al. (1982). A dideoxy chain termination method @anger et al,, 1977) modified for RNA template (Rico-Hesse et a/., 1987; Zimmern and Kaesberg, 1978) was used to sequence specific regions of the Ebola vRNA. The oli- godeoxynucleotide primers 5’-GGACACACAAAAA- GAAAGAA, 5’-AGCGTGATGGAGTGAAGCGCC, and 5’-GCCATAATTGTAACTCAATAT, which are compli- mentary to the vRNA sequences 2-2 1, 798-8 18, and 2907-2927, respectively, were used in dideoxy se- quencing to confirm sequences derived from cloned DNA. All primers used in this study were prepared in an automated DNA synthesizer (Applied Biosystems; Model 380A). The 5’ terminus of the NP mRNA was sequenced using a primer (5’-GATGTGGCTCTGAAA- CAAACC) corresponding to vRNA sequences 152 to 132. The primer was radioactively labeled at the 5’ end with [y-32P]ATP and T4 polynucleotide kinase, and pu- rified by electrophoresis on a 20°~ sequencing gel. The nucleotide was annealed to poly(A)-selected Ebola mRNA and extended with RT. Products of the reaction were isolated by electrophoresis on a 6% sequencing gel and chemically sequenced as described above.

Northern-blot hybridization

Ebola Mayinga vRNA, mRNA, and infected cell RNA were electrophoresed in acid-urea-agarose (1.5% w/v) slab gels (Rosen et a/., 1975). After electrophore- sis, gels were treated with 50 mM NaOH containing 1 pg/ml ethidium bromide for 20 min, washed with 25 mh/l sodium phosphate buffer, pH 6.5, photographed,

EBOLA VIRUS NUCLEOPROTEIN GENE 83

and blotted onto GeneScreen’ hybridization transfer membrane (New England Nuclear; No. NEF-972) by capillary action using the wash buffer. Membranes were then air-dried and baked at 80” under vacuum for 2 hr. Nick-translated probes were derived from cloned cDNA and were hybridized to membranes under strin- gent conditions (50% formamide), as described by the manufacturer (GeneScreen RNA Hybridization Method II).

Construction, site-directed mutagenesis, and expression of the NP gene

The complete nucleoprotein gene was assembled using the plasmids and restriction sites diagrammed in Fig. 6, resulting in the plasmid pEMNP. Site-directed mutagenesis of pEMNP was performed by a gapped- duplex method (Oostra et a/., 1983) using a synthetic primer (5’-TTGCTCGGAATCACAAGGATCCGAGTAT- GGATTCT) to create a second BarnHI restriction site 5 bases upstream of the methionine-initiated open read- ing frame (ORF) at base 470. This plasmid is desig- nated pEMNP-El in Fig. 6. A 2365-bp fragment con- taining the Ebola NP ORF in pEMNP-El was excised by digestion with BamHl (460-465) and Dral (2823- 2828), isolated by agarose electrophoresis, and direc- tionally ligated into the unique BarnHI and Smal sites of the cloning vector pSP64 (Melton et a/., 1984). The resulting plasmid (pSP6-EMNP) was used to generate uncapped transcripts (Krieg and Melton, 1984; Melton et a/., 1984) using a commercial transcription system (Promega). Transcripts were translated in vitro using a rabbit reticulocyte lysate system, then viral proteins were immunoprecipitated and resolved by sodium do- decyl sulfate-polyacrylamide gel electrophoresis (SDS- PAGE) as previously described (Sanchez and Kiley, 1987).

RESULTS

Cloning and sequence analysis of the 3’end of the Ebola vRNA

Figure 1 shows the principal viral-specific cDNA clones used to obtain sequence information for the 3’ end of the Ebola Mayinga genome. Clone 65 was the first clone to be sequenced and was produced from a mRNA preparation enriched for NP transcripts through sucrose gradient fractionation. It provided cDNA probes used in the isolation of clones D17, D16, and

’ Use of trade names is for identification only and does not imply endorsement by the Public Health Service or by the U.S. Department of Health and Human Services.

V6 from vRNA libraries. Clone D17 was generated from vRNA that had been poly(A)-tailed at the 3’ end, and first-strand synthesis primed with oligo(dT). This clone contains a homopolymeric sequence of 25 A residues that lead into the same 3’ end sequences of the Ebola Mayinga genome that were previously determined by direct RNA sequencing (Kiley et al., 1986). Clones D16 and V6 were derived from random primed vRNA librar- ies. Sequence information was double-strand data ob- tained from chemical sequencing, using the restriction enzymes shown in the lower part of Fig. 1. In addition, dideoxy sequencing using vRNA as template was used to verify bases 66-l 84,836-l 203, and 2936-3027.

Northern-blot hybridization of clones D17 and V6 to Ebola Mayinga vRNA and mRNA, shown in Fig. 2, dem- onstrates theirviral specificity, and identifies the mRNA transcripts of the first two genes. Clones D17 hybrid- izes to a single RNA species (approximately 28s in size), previously shown to be the NP gene transcript (Sanchez and Kiley, 1987) and establishes its location at the 3’ end of the genome. Clone V6, a vRNA clone that contains sequences that overlap the NP gene and the second gene, hybridizes to the NP mRNA and a 19s virus-specific transcript previously identified as mRNA 4 (Sanchez and Kiley, 1987).

Figure 3 shows the nucleotide sequence of the 3’ end of the Ebola Mayinga genome, presented as viral- complementary RNA (vcRNA), extending to position 3027. Identified on this sequence are a putative leader sequence, the NP gene transcriptional start and stop sites, and the NP coding region.

Identification of the transcriptional start and stop sites

The transcriptional start and stop sites for the NP gene were identified by sequence analysis of the S’and 3’ends of the NP mRNA. Figure 4A shows sequencing results for the 5’ end (start site) of the NP mRNA, pro- duced by primer extension and chemical sequencing of the extended products. The sequence, reading 5’to 3’, ends at a cytosine residue (position 56 in Fig. 3), followed by two more nucleotides (X’s in Fig. 4A). These last two nucleotides represent two extension products and were seen following electrophoretic iso- lation, but their proximity precluded separate elution. We attribute the synthesis of the two products to early termination of the extension reaction caused by a cap structure at the 5’ end and to the longer copy ends at the actual transcriptional start site. Reports that RT can add an additional base to the extension product by copying the 5’ m7G cap (Gupta and Kingsbury, 1984) were considered as evidence for the synthesis of the

84 SANCHEZ ET AL.

mRNA

Clonbb

Hindlll 1

Nbrl t

t

I I

I I

BOO 1000 lSO0 2000 2600 3000 I I I

I Nucleotidb Sbqubncb Number

FIG. 1. Schematic representation of the 3’ end of the Ebola Mayinga genome and the primary cDNA clones used in sequencing studies, The sequence includes a putative leader sequence and the complete NP gene. At the bottom is a restriction map of the cDNA information showing the restriction enzvmes used in chemical sequence analysis. Cloned inserts correspond to the following vRNA sequences: D17, l-2279; D16, 1530-2702; V6,2945-3488; 65, 18852595; N13,2732-3027.

larger copy, but this was rejected because the mRNA on the vRNA was thus determined to begin at base 54 would then have a pyrimidine residue at the 5’ end in- with the sequence 3’-UACUCCUUCUAAUU . . . , and stead of the usual purine. The transcriptional start site allows for a purine linked 5’cap on the NP mRNA.

EtBr D17 V6

VMC VMC VMC

28S-

18S-

FIG. 2. Northern-blot hybridization of nick-translation probes, pre- pared from cDNA clones D17 and V6. to Ebola Mayinga vRNA (V), a crude preparation of Ebola Mayinga mRNA (M), and control unin- fected vero E6 cell RNA (C). At the extreme left is a photograph of the electrophoresed RNA preparations before blotting with cellular 28s and 18s rRNAs marked.

The transcriptional stop site was predicted due to its homology with the Sendai virus polyadenylation site and its position after the NP ORF. This stop site was confirmed by isolating and sequencing an NP mRNA clone (N13) that contains part of the poly(A) tail (Fig. 4B). These results identify the polyadenylation site of the NP gene as 3’-. . . UAAUUCUUUUUU. The loca- tion of the NP gene transcription start and stop se- quences relative to the ORF indicates that the NP mRNA has long noncoding regions of 417 nucleotides at the 5’end, and 341 nucleotides at the 3’end (exclud- ing nonviral-complementary A-tail residues).

Analysis of the NP gene coding region

The predicted coding region for the NP gene origi- nates at base 470 of the vcRNA sequence and termi- nates at base 2687 with a UGA stop codon. It is initi- ated by the second AUG in the mRNA sequence, with the first AUG beginning the mRNA sequence at the 5’ end, The CRF was determined from the analysis of all three reading frames of vcRNA and vRNA sequences and is the only ORF sufficient in length to code for the nucleoprotein.

The predicted protein contains 739 amino acids, with a calculated molecular weight of 83.3K approxi- mately 20K lower than the 104K protein determined by

EBOLA VIRUS NUCLEOPROTEIN GENE 85

Leader ------a start Sit.0 --, 5' ~AAUUWUAGGAUCWUUGUGUGCGAAUAACU AUGAM;MOAWMUAAUUUUCCUCUCAUUOAMUUUAUAUCGO 112

uouuAcwuMucAcAccuOOWUGWUCM;AOCCACAUCkCA OMOOOMiCMGOOCAUCAGUGUGCUCAGUUGMAAUCCCWGUCMCACC 231

UAGGUCWAUCACAU cAcMGuuccAlxucAGAcucuGcAwGu GAUCCMCMCCU 350

wAAccuu GGWUWAACUUGAACACWAGGGGAUUGAAGAWCAACANXCUAAAGCUUGGGGUMAACA UUGGMAUAGUUAAAAGACAMUUGCUCGGAAWACAAAA WCCGAGU 469

AUG GAU UCU CGU CCU CAG AAA AUC UGG AUG GCG cc0 AGU CLIC ACU GAA UCU GAC AUG GAU UAC CAC AAG AUC UUG ACA GCA @Xl CUG UCC 559 1 MET Asp Ser Ar6 Pro Gin Lys Ile Trp MET Ala Pro Set Leu Thr Glu Sar Asp MET Asp Tyr Eis Lys Ile Leu Thr Ala Gly Leu Ser

GW CMCAGGGGAW GWCOOCAAAGAGUCAUC CCAGUGUAUCAAGUAAACMUCW GAAGMAWUGC CAACWAUCAUACAGGCC WU 649 31 Val Gin Gin Gly 110 Val Arg Gln Arg Val 110 Pro Val Tyr Gin Val Am Asn Lou Glu Glu Ila Cys Gln Leu 110 Ila Gin Ala Phe

GAAGCAGGUGWGAU UWCAAGAGAGU GCGGAC AGU WC CWCUC AUGCW UGUCWCAU CAU GCGUAC CAGGGAGAU UAC AAACWWC 739 61 Glu Ala Gly Val Asp Phe Gin Glu Ser Ala Asp Ser Pho Leu Lw MET Lou *cl Lou Sis His Ala Tyt 0l.n Gly Asp Tyr Lys Lou Phe

WG GAAAGU GGC GCAGUC AAG UAU WGGAA GGG CAC GGGWC CGU WU GMGUC AAGMGCGU GAU GGA GUGAAGCOC CW GAGGAA WG 629 91 Lou Glu Sar Gly Ala Val Lys Tyr Leu Glu Gly Eis Gly Phe Arg Ph. Glu Val Lys Lys Ar8 Asp Gly Val Lys Ars Leu Glu Glu Leu

CUGCCADCAGUAUCLIAGU GGAAMAAC AWAAGAGAACACUUGCU GCCAUGCCGGAAGAGGAGACAACU GAAGCU MUGCC GGU CAGWU 919 121 Leu Pro Ala Val Sor Ser Gly Lya Am Ile Lys Ars Thr Leu Ala Ala MET Pro Glu Glu 0l.u Thr Thr Glu Ala Asn Ala Gly Gin Phe

CUC UCC UUU GCAAGUCUAWC CWCCGAAAUUGGUAGUAGGAGAAAAG GCUUGC WGAGGAAGGW CMAGGCAA AW CMGUACAU GCA 1009 151 Leu Sar Phe Ala Ser Leu Phe Lau Pro Lys Lou Val Val Gly Glu Lys Ala Cys Lau At.6 Lys Val Gin Arg Gin Ile Gin Val Eis Ala

GAG CAA WA CUG AUA CM UAU CCA ACA GCU UGG CAA UCA GUA GGA CAC AU0 AU0 GUG AUU WC CGU WG AUG CGA ACA MU UW CLIG AUC 1099 181 Glu Gin Gly Lsu Ila Gin Tyr Pro Thr Ala Trp Gin Ser Val Gly His MET MET Val 116 Phe Arg. Leu MET Arg Thr Am Plm Leu Ile

MA WU CUC CUA AUA CAC CM OGG AUG CAC AUG GUU GCC GGG CAU GAU GCC AAC GAU GCU GUG AW UCA MU UCA GUG IXU CM GCU CGU 1169 211 Lys Phe Leu Leu Ile His Gin Gly MET Iiis MET Val Ala Gly Sis Asp Ala Am Asp Ala Val 11e Ser Asn Ser Val Ala Gin Ala Arg

UUU UCAGGC L&lAWGAWGUC MAACAGUACUUGAU CAUAUC CUACMMGACAGMCGAGGA GW CGUCUC CAUCCUCWGCAAGGACC 1279 241 Phe Ser Gly Lou Leu Ile Val Lys Thr Val Lou Asp His Ile Leu Gin Lys Thr Glu Arg Gly Val Arg Lou Eis Pro Lou Ala Ar6 Thr

GCCAAGGUAAAA MU GAGGUGAAC UCC CUUMGGCU GCACUC AGC UCC CUGGCC MO CAUGGAGAGUAU GCUCCU WC GCCCGACW UUG1369 271 Ala Lys Val Lys Ann Glu Val Aan Ssr Lou Lys Ala Ala Lou Ser Ser Leu Ala Lys His Gly Glu Tyr Ala Pro Phe Ala Arg Leu Leu

AAC CW UCU GGA WA MU MU CW GAG CAU GGU CUU UUC CCU CAA CUA UCG GCA AUU GCA CUC WA GUC WC ACA GCA CAC GGG AGU ACC 1459 301 Am Leu Ser Gly Val Asn Asn Leu Glu Sis Gly Lou Phe Pro Gin Leu Ser Ala Ile Ala Leu Gly Val Ala Thr Ala His Gly Ser Thr

CLIC GCA WA GUA MU GW GGA GAA CA0 UAU CM CM CUC AGA GAG GCU GCC ACU GAG GCU GAG AAG CAA CUC CM CAA UAU GCA GAG UCU 1549 331 Leu Ala Gly Val Am Val Gly Glu Gln Tyr Gin Gin Leu Arg Glu Ala Ala Thr Glu Ala Glu Lys Gln Leu Gin Gln Tyr Ala Glu Ser

CGC GM CW GAC CAU CUU GGA CW GAU GAU CAG GM MO AM AW CW AUG AAC UUC CAU CAG AM MG AAC GM AUC AGC WC CA0 CM 1639 361 Ar6 Glu Lou Asp His Lou Gly Lou Asp Asp Gin Glu Lys Lys Ile Leu MET Asn Phe His Gin Lys Lys Asn Glu 110 Sm Phe Gln Gin

ACAAAC GCUAUGWAACUCUAAGAAAAGAGCGC CUGGCC AAGCUGACAGMGCU AUC ACU GCU GCGUCACUGCCC AMACAAGUGGACAU 1729 391 Thr Asn Ala MET Val Thr Leu Arg Lys Glu Arg Leu Ala Lys Leu Thr Glu Ala Ile Thr Ala Ala Sar L6u Pro Lys Thr Ser Gly Bis

UAC GAUGAUGAUGAC GACAWCCC UUU CCAGGACCCAUC MU GAUGAC GAC MU CCU GGC CAU CAA GAU GAUGAU CCGACUGAC UCACAG1619 421 Tyr Asp Asp Asp Asp Asp Ile Pro Phe Pro Gly Pro Ila Asn Asp Asp Asp Asn Pro Gly Sis Gin Asp Asp Asp Pro Thr Asp Ser Gin

GAU ACG ACC AW CCC GAU WG WG GW GAU CCC GAU GAU GGA AGC UAC GGC GM UAC CAG AGU UAC UCG GM AAC GGC AU0 MU GCA CCA 1909 451 Asp Thr Thr Ile Pro Aap Val Val Val Asp Pro Asp Asp Gly Ser Tyr Gly Glu Tyr Gin Ser Tyr Ser Glu Asn Gly MET Asn Ala Pro

GAU GAC UUG GUC CUA WC GAU CUA GM: GAG GAC GAC GAG GAC ACU AAG CCA GUG CCU MU AGA UCG ACC AAG GGU CGA CM CAG AAG AAC 1999 481 Asp Asp Lou Val Leu Phe Asp Lau Asp Glu Asp Asp Glu Asp Thr Lys Pro Val Pro Asn Arg Ser Thr Lys Gly Gly Gin Gin Lys Asn

AGU CM AAG GGC 'ZAG CAU AUA GAG GGC AGA CM; ACA CM UCC AGG CCA AW CM MU WC CCA GGC CCU CAC AGA ACA AUC CAC CM: GCC 2069 511 Ser Gin Lys Gly Gin Eis Ile Glu Gly Arg Gin Thr Gin Set Arg Pro Ile Gin Asn Val Pro Gly Pro His Arg l’hr Ile His His Ala

AGU GCG CCA CLIC ACG GAC MU GAC AGA AGA MU GM CCC UCC GGC UCA ACC AGC CCU CGC AUG CUG ACA CCA AW AAC GM GAG GCA GAC 2179 541 Ser Ala Pro Leu Thr Asp Asn Asp Arg Are Asn Glu Pro Ser Gly Ser Thr Ser Pro Arg MET Leu Thr Pro Ile Asn Glu Glu Ala Asp

CCA CUG GAC GAU GCC GAC GAC GAG AC0 UCU AGC CUU CCG CCC UUG GAG UCA GAU GAU GM GAG CAG GAC AGO GM: GGA ACU UCC AAC CGC 2269 571 Pro Lau Asp Asp Ala Asp Asp Glu Thr Ser Ser Leu Pro Pro Lou Glu Ser Asp Asp Glu Glu Gin Asp Arg Asp Gly Thr Ssr Asn Arg

ACA CCC ACU WC GCC CCA CC0 GCU CCC GUA UAC AGA GAU CAC UCU GAA AAG MA GM CUC CC0 CAA GAC GAG CM CM GAU CAG GAC CAC 2359 601 Thr Pro Thr Val Ala Pro Pro Ala Pro Val Tyr Arl Asp Ais Ser Glu Lys Lys Glu Leu Pro Gin Asp Glu Gln Gin Asp Gl.n Asp Bis

ACU CAA GAG GCC AGG AAC CAG GAC AGU GM: MC ACC CAG UCA GM CAC UCU WU GAG GAG AU0 UAU CGC CAC AW CUA AGA UCA CAG GGG 2449 631 Thr Gin Glu Ala Arg Asn Gin Asp Ser Asp Asn Thr Gin Ser Glu Eis Set Phe Glu Glu MET Tyr Arg Eis Ile Leu Aq Ser Gin Gly

CCA WU GAU GCU GUU WG UAU UAU CAU AU0 AUG AAG GAU GAG CCU GUA GUU UUC AGU ACC AGU GAU WC AAA GAG UAC ACG UAU CCA GAC 2539 661 Pro Phe Asp Ala Val Lou Tyr Tyr Eis MET MET Lys Asp Glu Pro Val Val Phe Ser Thr Ser Asp Gly Lys Glu Tyr Thr Tyr Pro Asp

UCC CUU GMGAG GAA UAU CCACCA UGG CUC ACU GAAAMGAG GCU AUGMU GAAGAG MU AGA UUU GW ACAUUGGAU GGU CM CAA UUlJ 2629 691 Ser Leu Glu Glu Glu Tyr Pro Pro Trp Leu Thr Glu Lys Glu Ala MET Asn Glu Glu Asn kg Phe Val Thr Leu Asp Gly Gln Gin Plm

UAU UGG CC0 WG AU0 AAU CAC AAG MU MA WC AU0 GCA AUC CUG CM CAU CAU CAG UGA AUGAGCAUGGMCAAUGGGAUGAUUCAACCGACAAAUAG 2729 721 Tyr Ttp Pro Val Mel Asn Eis Lys Asn Lys Phe MET Ala Ile Lou Gin His His Gin TER

OC~C~CM~~CUUMUGCAMUMjGU 2967

UUUAUAACWACXUACUAGCCUGCCCAACAWUACACGAUCGUUWAUMWAAGAMMA 3027 Poly(A) Site

FIG. 3. Viral complementary sequences for the 3’end of the Ebola Mayinga genome. Identified are the putative leader and the entire NP gene, with its transcriptional start and stop (polyadenylation) sites and coding region. The exact beginning of the NP gene at base 54 (start site) is tentative, since it was determined by chemically sequencing products derived from primer extension to the 5’ end of the NP mRNA (see Fig. 4) and confirmation by direct 5’ end sequence analysis has not been performed.

86 SANCHEZ ET AL

T+C C A+G G T+C C A+G G

FIG. 4. DNA sequencing gels (Maxam and Gilbert, 1980) showing the 3’end (A) and the 5’end (8) of the Ebola Mayinga NP gene (vRNA- sense sequences). The transcriptional start site for the NP gene was determined by annealing a synthetic DNA primer to an area close to the 5’ end of the NP mRNA, extending the primer with reverse transcriptase to the extreme 5’end, and chemically sequencing the extended copy (A). The sequence ladder reads 5’ to 3’ (bottom to top), with the last two bases unreadable, and represents extension to the ultimate and penultimate bases. The 5’ end of the NP gene was identified by the isolation of NP mRNA clones containing the poly(A) tail. This region is shown in (6) (clone N13), and contains the 5’ most viral sequences that lead into 17 bases of the poly(A) tail (reading 3’to 5’).

SDS-PAGE (Kiley et a/., 1980). Table 1 shows the amino acid composition of the Ebola NP as deduced from nucleic acid sequencing. The NP has a calculated net charge of -30 at neutral pH and has been shown to be phosphorylated (Elliott et al., 1985) which contri- butes to an even greater net negative charge. It is also noted that all three cysteine residues are located within the first quarter of the molecule from the N-terminus, and that 32 of 42 proline residues are located in the C- terminal half.

A hydropathic plot (Kyte and Doolittle, 1983) of the amino acid sequence shows that the protein can be divided roughly into a hydrophobic N-terminal half and

a hydrophilic C-terminal half (Fig. 5). Approximately 75% of the acidic amino acids lie in the C-terminal half, with the greatest concentration located between resi- dues 442 and 494. This region contains 28 acidic resi- dues and 1 weakly basic (His) residue. It begins with five consecutive aspar-tic acids and ends with six con- secutive aspartic and glutamic acid residues.

Construction and in vitro expression of the NP gene coding region

To confirm that coding region begins at base 470, or at another AUG codon downstream, a plasmid was engineered to contain the entire NP gene with a BamHl immediately upstream of the ORF (see Materials and Methods). Figure 6 shows the construction and muta- genesis of the NP gene. The NP coding region was ex- cised from the resulting plasmid (pEMNP-El) and di- rectionally ligated into the cloning vector pSP64, and in vitro transcripts were generated as detailed under Materials and Methods. The transcript was found to migrate in acid-urea-agarose gels as a single species, just ahead of the natural NP mRNA (as predicted), and

TABLE 1

AMINO ACID COMPOSITION OF THE EBOW MAYINGA NUCLEOPROTEIN, AS DETERMINED FROM NUCLEIC ACID SECXJENCE ANALYSIS

R group Amino

acid No.

residues Percentage

total

Negatively charged

Positively charged

Polar uncharged

Nonpolar uncharged

Net charge = -30a Calculated MW = 83.3K

Asp 59 7.90 Glu 58 7.85 Arg 34 4.60 His 30 4.38 LYS 38 5.14 Asn 33 4.47 CYS 3 0.41 Gln 53 7.17 GIY 41 5.54 Ser 4% 6.50 Thr 38 5.14 Tyr 21 2.84 Ala 53 7.17 He 29 3.92 Leu 68 9.20 Met 20 2.70 Phe 25 3.38 Pro 42 5.68 Trp 4 0.54 Val 42 5.68

B The net charge was calculated by assigning charged amino acids a value of -1 (Asp + Glu), +l (Arg + Lys), or +0.5 (His). Histidine was given a lower value due to its weak charge at neutral pH (~17.6).

EBOLA VIRUS NUCLEOPROTEIN GENE 87

10

0

-10

-20

1

Negative - . Y . . . . . m - . . I . . . . . ..--- . . . B.--...- . ” -- --.. . .--...

Positive - . - ..- - .- -. ..--... . . . . . -. .S”.. . . . . . . . . . . . . . .

t I

0 100 200 300 400 500 600 700

Amino Acid Sequence Number

FIG. 5. Hydropathic plot (Kyte and Doolittle, 1982) of the predicted amino acid sequence for the Ebola Mayinga NP. Hydrophobic regions are shaded above the midline, and hydrophilic shaded below. Negatively charged residues (Asp + Glu) and strong positively charged residues (Lys + Arg) are indicated below the hydropathic plot. A window size of seven residues was used to generate hydropathic values for each data point.

results of in vitro translation of this transcript are shown in Fig. 7. The transcript directs the synthesis of an au- thentic NP, which comigrates with the NP produced from translated NP mRNA. Translation reactions

BK

0

pUO1B

BarnHI Kpnl

I

FIG. 6. Construction of plasmids containing the entire NP gene and part of the second gene (vRNA sequences l-3488). Cloned viral se- quences representing nontranslated regions of the NP gene and the second gene of Ebola are shown in white, the NP coding region is shown shaded, and pUCl8 sequences are shown in black.

primed with either NP mRNA or the in vitro transcript direct the synthesis of not only NP, but at least five smaller comigrating species. Since these products are

NP-

VP40- VP35- VP30-

VP24-

FIG. 7. SDS-PAGE (12% slab gel) of in vitro translation products immunoprecipitated using a pooled human anti-Ebola serum. Trans- lation reactions were primed with Vero E8 total cell RNA (lane l), Ebola Mayinga mRNA (lane 2), and an in vitro generated transcript containing the Ebola Mayinga NP coding region (lane 3). Indicated at the left edge are the migration positions of five viral structural pro- teins. The arrow at the right edge identifies a translation product di- rected by the in vitro transcript that comigrates with authentic Ebola NP.

88 SANCHEZ ET AL.

immunoprecipitated and are not evident in the control cell RNA lane, it is assumed that they arise from the NP coding region, possibly as a result of in-frame internal initiation. It should be noted that these additional pro- tein bands are normally not as prominent as is seen in Fig. 7, but are invariably present when the NP mRNA or transcript are in vitro translated.

DISCUSSION

Nucleotide sequence analysis of the 3’ end of the Ebola Mayinga genome has identified a putative leader sequence, has delineated the nucleoprotein gene, and has identified the NP coding region. These findings in- dicate that the genome is organized in a manner similar to those of rhabdoviruses and paramyxoviruses (Strauss and Strauss, 1983).

The putative leader is at most 53 nucleotides in length, which is comparable to leader sequences of other nonsegmented, negative-strand RNA viruses (Gi- orgi eta/., 1983; Keene et a/., 1980; Kurilla eta/., 1985). Our sequence data for the extreme 3’ end of the ge- nome confirm those of direct RNA sequencing studies (Kiley et a/., 1986), which demonstrated a high degree of sequence conservation between the Zaire and Su- dan subtypes of Ebola, lesser homology between Ebola and Marburg viruses, and no homology between Ebola and rhabdoviruses or paramyxoviruses. These findings imply that Ebola and Marburg viruses are closely related evolutionarily, but are only remotely re- lated to other nonsegmented negative-strand RNA vi- ruses.

The Ebola NP gene was delineated by determining the transcriptional start and stop sites (Fig. 4). The start site for the NP gene was shown to begin at position 54 by primer extension to the ultimate base at the 5’end of the NP mRNA and chemically sequencing the reaction products. The extension reaction results in two prod- ucts, differing in length by 1 base, which has been pre- viously attributed the addition of an extra base at the 5’ end by RT, with the shorter copy representing the ac- tual start of the gene (Gupta and Kingsbury, 1984). If this is true for the Ebola Mayinga NP mRNA, the results shown in Fig. 4A would then indicate that if present, the m7G cap would be linked to a uridine residue. This is contrary to the conviction that 5’ caps are linked to purines, and it is our feeling that the shorter version is actually a result of early termination of primer exten- sion, caused by a cap structure, and would allow for its linkage to an adenosine residue. Thus, Fig. 3 depicts a putative start site, and if not exact, is at least within 1 base of the actual start. The transcriptional stop (polya- denylation) site was determined by sequencing a

TABLE 2

COMPARISON OF THE 3’ AND 5’ ENDS OF THE NUCLEOPROTEIN GENES OF Eaol~ AND THOSE OF SELECTED PARAMYXOVIRUSES AND RHABDOVIRUSES~

Virusb 3’ end (start) 5’end (poly(A) site)

Ebola UACUCCUUCUAAUU uAAuucuuuuuu Sendai UCCCAGUUUC AUUCUUUUU HP3 uccuAAuuuc UUUAUUCUUUUU RSV CCCCGUUUAU UCAAUUAUUUUUU Measles UCCUAAGUUC AUUAUUUUUU NDV UGCCCAUCUUCC AAucuuuuuu vsv UUGUCAUUAG AUACUUUUUUU Rabies UUGUGGAGUA GUACUUUUUUU

a vRNA sequences. b Sources of sequence data and abbreviations: Sendai (Morgan et

a/., 1984); HP3, human parainfluenza type 3 (Jambou et a/., 1986; Spriggs and Collins, 1986); RSV, respiratory syncytial virus (Collins et al., 1985, 1986); measles (Bellini et al., 1986; Rozenblatt et a/., 1985); NDV, Newcastle disease virus (Ishida ef a/., 1986); VSV, ve- sicular stomatitis virus (Gallione ef a/., 1981); rabies (Tordo ef a/., 1986).

mRNA clone that contains part of the poly(A) tail (Fig. 4B).

The transcriptional signals of the Ebola NP gene are similar to those of other nonsegmented negative- strand RNA viruses (Chambers et a/., 1986; Gupta and Kingsbury, 1984; Rose, 1980; Strauss and Strauss, 1983) as seen in Table 2. These genes begin with a pyrimidine, usually a uridine residue, corresponding to a complementary purine residue on the mRNA that is presumably capped. Within the start sites, one can see that the third base for Ebola and most of the paramyxo- viruses is a cytosine, and that in these same viruses the triplet UUC is also seen. Outside of these similari- ties, the start sequences of these virus genes lack strong homology. The polyadenylation sites on the other hand show greater homology, due largely to the poly(U) region, which is thought to function as a signal for repeated copying of the poly(U) sequence (stutter- ing) by the viral polymerase during transcription. The similarity in the transcriptional strategy that Ebola and these viruses use, together with the basic genomic or- ganization, is additional evidence for a common lin- eage.

Analysis of the vcRNA sequence for the 3’end of the Ebole genome identified a potential AUG initiation co- don beginning at position 470, which initiates an ORF of 2217 bases. The sequences flanking this AUG co- don put it in a good context for initiating translation and closely match the sequence 5’-CCACCAUGG which has been identified as the optimal eukaryotic sequence

EBOLA VIRUS NUCLEOPROTEIN GENE 69

for ribosome initiation of translation (Kozak, 1986a), where the A at position -3 (A of AUG = +l) and the G at position +4 are most critical for efficient initiation. The NP mRNA nontranslated sequences of 416 bases at the 5’ end represent an extremely long region that evidently does not present an obstacle to translation, under the scanning model of Kozak (Kozak, 1986a,b).

Translation of an in vitro generated transcript con- taining the NP coding region resulted in the synthesis of an authentic NP (Fig. 7). This finding demonstrates that the predicted translation initiation site is at least 417 bases from the 5’end of the NP mRNA. In addition to the NP, translation reactions for both an Ebola May- inga mRNA preparation and the in vitro transcript re- sulted in the synthesis of five comigrating bands. A likely possibility for the presence of these proteins is internal initiation at AUG codons that are in a favorable context for recognition by scanning ribosomes. This in- ternal initiation may arise from limitations in the transla- tion system, or from the lack of a 5’ cap structure in the in vitro transcript and loss of caps in mRNA prepara- tions during purification and storage.

Analysis of the NP amino acid sequence show it to be divided into a hydrophobic N-terminus and a hydro- philic and extremely acidic C-terminus (Fig. 5). The acidic nature of the C-terminus of the Ebola NP is anal- ogous to the Sendai virus NP (Morgan eta/., 1984). For Sendai virus, this region of the NP is thought to bind the positively charged matrix protein during maturation of the virion, and for Ebola virus may have a similar function. The acidic region of the Ebola NP is also sim- ilar to the acidic N-terminal half of the VSV NS protein, a protein that has an important but undefined role in the transcriptional complex of VSV (Banerjee, 1987).

The NP of Ebola Mayinga virus has an observed mo- lecular weight of 104K, as determined by SDS-PAGE, which is considerably larger than the calculated value of 83.3K, derived from the predicted amino acid se- quence. Comparison of sequence-calculated and SDS-PAGE-determined values by other researchers have resulted in similar observations for the nucleopro- teins of related viruses (Galinski et a/., 1986; Gallione et al., 1981; Tordo et a/., 1986). The lower calculated value can be partially explained by omission of phos- phate groups in calculations, but if all of the Ebola NP serine and threonine residues are phosphorylated, the additional mass would add only 7K to the calculated value. The effect of the extreme net negative charge may be responsible for the discrepancy, since migra- tion of a given protein in SDS-PAGE is influenced by binding of SDS to a peptide sequence. Recently, it was demonstrated that single amino acid mutations in the VSV NS protein, which resulted in changes in the

charge of the molecule, had a dramatic effect on the apparent molecular weight when determined by SDS- PAGE (Rae and Elliott, 1986). Single amino acid changes from Glu to Lys and Glu to Gly resulted in a drop in the apparent molecular weight, from the wild type 59K to 50K and 54.5K, respectively. Despite the lower calculated molecular weight, the Ebola NP is still larger than the 42-68K range reported for other non- segmented negative-strand viruses (Collins et al., 1985; Galinski et al., 1986; lshida et al., 1986; Morgan et al., 1984; Rozenblatt et al., 1985; Sakai et al., 1987; Tordo et al., 1986).

Computer-assisted comparisons of the nucleic and amino acid sequences (Devereux et al., 1984; Wilbur and Lipman, 1983) of the nucleoproteins genes of Ebola virus and the viruses, listed in Table 2, failed to show any significant homology. Searches of the viral section of the NIH-GenBank Sequence Library (release 56.0, 7/88) produced similar results. The NP is usually highly conserved within viral families, and comparisons can serve to measure relatedness. The findings above, however, and the lack of antigenic cross-reactions be- tween Ebola virus and other nonsegmented negative- strand RNA viruses may indicate an early evolutionary divergence.

In conclusion, we have cloned and sequenced the Ebola Mayinga NP gene, which represents the first characterization of a filovirus gene. In addition, we have expressed the NP gene product by translation of an in vitro generated transcript containing the NP ORF, which verifies the predicted coding region and the loca- tion of the NP gene at the extreme 3’ end of the ge- nome. Our findings clearly demonstrate the similarity of Ebola virus to rhabdoviruses and paramyxoviruses, by virtue of the genomic organization at the 3’ end and the transcriptional strategy that these viruses use in synthesizing mRNA. Studies aimed at further defining the Ebola genome and expressing viral genes are in progress and will provide data and reagents useful in understanding the molecular biology of the extremely virulent human pathogens that constitute the family Fi- loviridae.

ACKNOWLEDGMENTS

We thank William I. Bellini for his helpful discussion and sugges- tions, Olen M. Kew and Baldev K. Nottay for their help in RNA se- quencing, Carolyn D. Sanchez for her assistance in computer pro- gramming, and those persons at the Centers for Disease Control who contributed to the preparation of this report.

This research was partially supported by an interagency agree- ment with the U.S. Army Medical Research Institute of Infectious Dis- eases (Log No. 62100006).

90 SANCHEZ ET AL.

REFERENCES

AUPERIN, D. D., SASSO, D. R., and MCCORMICK, 1. B. (1986). Nucleo- tide sequence of the glycoprotein gene and intergenic region of the Lassa virus S genome RNA !//ro/ogy 154,155-l 67.

BANERJEE, A. K. (1987). The transcription complex of vesicular stoma- titis virus. Cell48, 363-364.

BARON, R. C., MCCORMICK, J. B., and ZUBEIR, 0. A. (1983). Ebola virus disease in southern Sudan: Hospital dissemination and intrafami- lial spread. Bull. W.H.O. 61,997-1003.

BELLINI, W. J., ENGLUND, G., RICHARDSON, C. D., ROZENBLATT, S., and kZARlNl, R. A. (1986). Matrix genes of measles virus and canine distemper virus: Cloning, nucleotide sequences, and deduced amino acid sequences. J. l&o/. 58,408-416.

BISHOP, D. H. L., GOULD, K. G., AKASHI, H., and CLERX-VAN HAASTER, C. (1982). The complete sequence and coding content of snow- shoe hare bunyavirus small (S) viral RNA species. Nucleic Acids Res. 10,3703-3713.

BOWEN, E. T. W., PLATT, G. S., LLOYD, G., BASKERVILLE, A., HARRIS, W. J., and VELLA, E. E. (1977). Viral haemorrhagic fever in southern Sudan and northern Zaire. Lancer 1,571-573.

BUCHMEIER, M. J., DEFRIES, R. U., MCCORMICK, J. B., and KILEY, M. P. (1983). Comparative analysis of the structural polypeptides of Ebola viruses from Sudan and Zaire. 1. infect. Dis. 147, 276-281.

CHAMBERS, P., MILLAR, N., BINGHAM, R. W., and EMERSON, P. T. (1986). Molecular cloning of complementary DNA to Newcastle disease virus, and nucleotide sequence analysis of the junction between the genes encoding the haemagglutinin-neuraminidase and the large protein. J. Gen. Viral. 67, 475-486.

COLLINS, P. L., ANDERSON, K., LANGER, S. J., and WERTZ, G. W. (1985). Correct sequence for the major nucleocapsid protein mRNA of res- piratoty syncytial virus. virology 146, 69-77.

COLLINS, P. L., DICKENS, L. E., BUCKLER-WHITE, A., OLMSTED, R. A., SPRIGGS, M. K., CAMARGO, E., and COELINGH. K. V. W. (1986). Nu- cleotide sequences for the gene junctions of human respiratory syncytial virus reveal distinctive features of intergenic structure and gene order. Proc. Netl. Acad. SC/. USA 83,4594-4598.

Cox, N. J., MCCORMICK, J. B., JOHNSON, K. M., and KILEY, M. P. (1983). Evidence for two subtypes of Ebola virus based on oligonucleotide mapping of RNA. J. Infect. Dis. 147,272-275.

DEVEREUX, J., HAEBERLI, P., and SMITHIES, 0. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12,387-395.

ELLIOTT, L. H., KILEY, M. P., and MCCORMICK, J. B. (1985). Descriptive analysis of Ebola virus proteins. Virology 147, 169-l 76.

GALINSKI, M. S., MINK, T. A., LAMBERT, D. M., WECHSLER, S. L., and PONS, M. W. (1986). Molecular cloning and sequence analysis of the human parainfluenza 3 virus RNA encoding the nucleocapsid protein. Vifo/ogy 149, 139- 15 1.

GALLIONE, C. J., GREENE, J. R., IVERSON, L. E., and ROSE, J. K. (1981). Nucleotide sequences of the mRNA’s encoding the vesicular sto- matitis virus N and NS proteins. J. Vkol. 39, 529-535.

GIORGI, C., BLUMBERG, B., and KOLAKOFSKY, D. (1983). Sequence de- termination of the (+) leader RNA regions of the vesicular stomati- tis virus Chandipura. Cocal. and Pity serotype genomes. J. Vifol. 46,125-l 30.

GRUNSTEIN, M., and HOGNESS, S. S. (1975). Colony hybridization: A method for the isolation of cloned DNAs that contain a specific gene. Proc. Nat/. Acad. Sci. USA 72, 3961-3965.

GUBLER, U., and HOFFMAN, G. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263-269.

GUPTA. K. C., and KINGSBURY, D. W. (1984). Complete sequences of the intergenic and mRNA start signals in the Sendai virus genome:

Homologies with the genome of vesicular stomatitis virus. Nucleic Acids Res. 12,3829-3841,

HEYMANN, D. L., WEISFELD, J. S., WEBB, P. A., JOHNSON, K. M., CAIRNS, T., and BERQUIST, H. (1980). Ebola hemorrhagic fever: Tandala, Zaire, 1977-l 978. J. Infect. Dis. 142, 372-376.

ISH-HOROWICZ, D., and BURKE, J. F. (1981). Rapid and efficient cosmid cloning. Nucleic Acids Res. 9, 2989-2998.

ISHIDA. N., TAIRA, H., OMATA, T., MIZUMOTO, K., HATTORI, S.. IWASAKI, K., and KAWAKITA, M. (1986). Sequence of 2,6 17 nucleotides from the 3’ end of Newcastle disease virus genome RNA and the pre- dicted amino acid sequence of viral NP protein. NucleicAcids Res. 14,6551-6564.

JAMBOU, R. C., ELANGO, N., VENKATESAN, S., and COLLINS, P. L. (1986). Complete sequence of the major nucleocapsid protein gene of hu- man parainfluenza type 3 virus: Comparison with other negative strand viruses. J. Gen. Virol. 67, 2543-2548.

JOHNSON, K. M., WEBB, P. A., LANGE, L. V., and MURPHY, F. A. (1977). Isolation and partial characterization of a new virus causing acute haemorrhagic fever in Zaire. Lancet 1, 569-571.

KEENE, J. D., SCHUBERT, M., and ~AZZARINI, R. A. (1980). Intervening sequence between the leader region and the nucleocapsid gene of vesicular stomatitis virus RNA. 1. Viral. 33, 789-794.

KILEY, M. P., REGNERY. R. L., and JOHNSON, K. M. (1980). Ebola virus: Identification of virion structural proteins. J. Gen. V/ro/. 49, 333- 341.

KILEY. M. P., WILUSZ, J., MCCORMICK, J. B., and KEENE, J. D. (1986). Conservation of the 3’terminal nucleotide sequences of Ebola and Marburg Virus. Virology 149, 251-254.

KOZAK, M. (1986a). Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribo- some.% Ce/l44,283-292.

KOZAK, M. (1986b). Regulation of protein synthesis in virus-infected animal cells. Adv. Virus Res. 31, 229-292.

KRIEG, P. A., and MELTON, D. A. (1984). Functional messenger RNAs are produced by SP6 in vitro transcription of cloned cDNAs. NucleicAcids Res. 12,7057-7070.

KURILLA, M. G., STONE, H. O., and KEENE, J. D. (1985). RNA sequence and transcriptional properties of the 3’ end of the Newcastle dis- ease virus genome. Virology 145, 203-212.

KYTE, J., and DOOLITTLE, R. F. (1982). A simple method for displaying the hydropathic character of a protein. 1. Mol. Biol. 157, 105-l 32.

MANIATIS, T., FRITSCH, E. F., and SAMBROOK, 1. (1982). “Molecular Cloning: A Laboratory Manual.” Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.

MAXAM, A., and GILBERT. W. (1980). Sequencing end-labeled DNA with base-specific chemical cleavages. /n “Methods in Enzymol- ogy” (L. Grossman and K. Moldave, Eds.), Vol. 65, pp. 499-560. Academic Press, New York.

MCCORMICK, J. B., BAUER, S. P.. ELLIOTT, L. H., WEBB, P. A., and JOHN- SON, K. M. (1983). Biologic differences between strains of Ebola virus from Zaire and Sudan. J. Infect. D/s. 147, 264-267.

MELTON, D. A., KRIEG, P. A., REBAGLIATI, M. R., MANIATIS, T., ZINN, K., and GREEN, M. R. (1984). Efficient in vitro synthesis of biologically active RNA and RNA hybridization probes from plasmids contain- ing a bacteriophage SP6 promoter. Nucleic Acids Res. 12, 7035- 7056.

MORGAN, E. M., RE, G. G., and KINGSBURY, D. W. (1984). Complete sequence of the Sendai virus NP gene from a cloned insert. Virol- ogy 135,279-287.

OOSTRA, B. A., HARVEY, R., ELY, B. K., MARKHAM, A. F., and SMITH, A. E. (1983). Transforming activity of polyoma virus middle-T anti- gen probed by site-directed mutagenesis. Nature (London) 304, 456-459.

EBOLA VIRUS NUCLEOPROTEIN GENE 91

RAE, 8. P., and ELLIOTI-, R. M. (1986). Characterization of the muta- tions responsible for the electrophoretic mobility differences in the NS proteins of vesicular stomatitis virus New Jersey complementa- tion group E mutants. 1. Gen. Viral. 67, 2635-2643.

REGNERY, R. L., JOHNSON, K. M., and KILEY, M. P. (1980). Virion nucleic acid of Ebola virus. J. Virol. 36,465-469.’

REGNERY, R. L., JOHNSON, K. M., and KILEY, M. P. (1981). Marburg and Ebola viruses: Possible members of a new group of negative strand viruses. In “The Replication of Negative Stand Viruses” (D. H. L. Bishop and R. W. Compans, Eds.), pp. 971-977. Elsevier/ North-Holland, New York.

RICHMAN, D. D., CLEVELAND, P. H., MCCORMICK, J. B., and JOHNSON, K. M. (1983). Antigenic analysis of strains of Ebola virus: Identifi- cation of two Ebola virus serotypes. J. Infect. Dis. 14, 268-271,

RICO-HESSE, R., PALLANSCH, M. A., NOTTAY, B. K., and KEW, 0. M. (1987). Geographic distribution of wild poliovirus type 1 geno- types. virology 160, 31 l-322.

ROSE, J. K. (1980). Complete intergenic and flanking gene sequences from the genome of vesicular stomatitis virus. Cell 19,415-421.

ROSEN, J. M., Woo, S. L. C., HOLDER, 1. W., MEANS, A. T., and O’MAL- LEY, B. (1975). Preparation and preliminary characterization of puri- fied ovalbumin messenger RNA from the hen oviduct. Biochemis- fry 14,69-78.

ROZENBIATT, S., EISENBERG, O., BEN-LEVY, R., LAVIE, V., and BELLINI, W. 1. (1985). Sequence homology within the morbilliviruses. J. Virol. 53, 684-690.

SAKAI, Y., Suzu, S., SHIODA, T., and SHIBUTA, H. (1987). Nucleotide sequence of the bovine parainfluenza 3 virus genome: Its 3’ end

and the genes of NP, P, C and M proteins. Nucleic Acids Res. 15, 2927-2944.

SANCHEZ, A., and KILEY, M. K. (1987). Identification and analysis of Ebola virus messenger RNA. Virology 157,414-420.

SANGER, F., NICKLEN, S., and COULSON, A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proc. Nat/. Acad. Sci. USA 74, 5463-5467.

SPRIGGS, M. K., and COLLINS, P. L. (1986). Human parainfluenza virus type 3: Messenger RNAs, polypeptide coding assignments, inter- genie sequences, and genetic map. 1. Viral. 59, 646-654.

STRAUSS, E. G., and STRAUSS, J. H. (1983). Replication strategies of the single-stranded RNA viruses of eukatyotes. Curr. Top. Micro- biol. lmmunol. 105, l-98.

TORDO, N., POCH, O., ERMINE, A., and KEITH, G. (1986). Primary structure of leader RNA and nucleoprotein genes of the rabies genome: Seg- mented homology with VSV. NucleicAcids Res. 14,2671-2683.

WEBB, P. A., JOHNSON, K. M., WULFF, H., and LANGE, J. V. (1978). Some observations on the properties of Ebola virus. In “Ebola Vi- rus Haemorrhagic Fever” (S. R. Pattyn, Ed.), pp. 91-94. Elsevier/ North-Holland, New York.

WILBUR, W. J., and LIPMAN, D. J. (1983). Rapid similarity searches of nucleic acid and protein data banks. froc. Nafl. Acad. Sci. USA 80,726-730.

ZIMMERN, D., and KAESBERG, P. (1978). 3’Terminal nucleotide se- quence of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proc. Nat/. Acad. Sci. USA 75,4257-4261.