Orphon spliced-leader sequences form part of a repetitive element ...

6
1030-1035 Nucleic Acids Research, 1995, Vol. 23, No. 6 1995 Oxford University Press Orphon spliced-leader sequences form part of a repetitive element in Angiostrongylus cantonensis G. W. P. Joshua*, F. B. Perler 1 and C. C. Wang 2 Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan, Republic of China, 1 New England Biolabs Inc., 32 Tozer Road, Beverly, MA 01915, USA and 2 University of California, San Francisco, CA 94143, USA Received October 19, 1994; Revised and Accepted February 7, 1995 GenBank accession nos U13190and U13191 ABSTRACT In nematodes, the 22 nucleotlde (nt) spliced leader (SL) Is normally encoded by a multi-copy, tandemly reiter- ated SL gene and is trans-spliced from SL-RNA onto the 5' end of a subset of mRNAs. We have found that the SL is also encoded at multiple (>100) orphon genomic sites in the parasitic nematode Angios- trongylus cantonensis. At these sites the sequence forms part of a 198 bp repetitive element (designated con-198). Transcription from two genomic elements that contain the con-198 sequence has been character- ised. At one element (G-2) an -850 nt RNA with an internal SL Is transcribed. At the other (G-1), transcrip- tion takes place 3 kb downstream of the con-198 sequence. INTRODUCTION Angiostrongylus cantonensis is a parasitic nematode normally found, at the adult stage, as a lung worm in rats. After oral infection it has a migratory period that includes infection of the brain. When it infects humans, development is arrested at the brain and worms reside in the sub-arachnoid space, causing eosinophilic meningitis (I). Like the mRNAs of other nematodes (2-5), kinetoplastid protozoa (6), trematodes (7) and Euglena (8), mRNAs of A.cantonensis include in their maturation the /ra/w-splicing of a 5' untranslated leader sequence. In nematodes this process occurs on a sub-set of mRNAs (3). Nematode genes may also contain introns, necessitating both cis- and rranj-splicing of mRNAs. In all nematodes studied to date the frans-spliced leader sequence (SL) is 22 nucleotides (nt) long and is absolutely conserved (5'-GGTTTAATTACCCAAGTTTGAG-3'). A second SL se- quence has been found in Caenorhabditis elegans (9). The mechanism of /rans-splicing has been analysed using cell-free extracts of Ascaris suum (10-12) and the SL-gene and SL-RNA have been described for a number of nematodes (2-5) including A.cantonensis (13). In A.cantonensis the SL gene is multi-copy and, unlike that of other nematodes, does not alternate with the 5S RNA gene. The SL-RNA transcribed from the SL-gene is 110 nt long with the 22 nt SL sequence at the 5' end. It possesses an SM antigen binding site and has a secondary structure which resembles that predicted for SL-RNAs of both kinetoplastids and nematodes (14). Thus, fran^-splicing of the SL sequence is thought to occur in A.cantonensis as it does in other nematodes. The SL sequence of both kinetoplastids and nematodes has been found at genomic locations other than the tandemly reiterated SL-genes participating in rra/u-splicing. These have been termed 'orphon' sequences (15). In kinetoplastids, orphon SL sequences are associated with DNA homologous to the intergenic regions of the SL gene repeat. In addition, the SL gene repeat may be interrupted by regularly interspersed transposable inserted elements; the CRE1 element in Crithidia fasciculata (16), and the SLACS (17,18) and MAE (19) elements in Trypanosoma brucei. In contrast, orphon SL sequences in nematodes are not associated with other regions of the SL gene repeat. In Brugia malayi, no transcription of orphon SL sequences is detected (4) whereas in Onchocerca volvulus transcription from several different DNAs with an orphon SL sequence is seen (20). In this work we initially set out to identify /ra/w-spliced mRNAs, by screening an A.cantonensis cDNA library with the anti-sense 22 nt SL sequence to identify full length cDNAs with the r/ww-spliced leader sequence. Wereportrather the identifica- tion of cDNAs with an internal rather than 5' SL sequence, the isolation of corresponding genomic elements in which the SL sequence forms part of a 198 bp repetitive element, and transcription from these genomic elements. MATERIALS AND METHODS Parasites The A.cantonensis life-cycle was maintained through Achatina fulica snails and Sprague Dawley rats. Adult A.cantonensis were extracted from rat pulmonary blood vessels 40 days post infection. Nucleic acids methods Nucleic acids were extracted as described previously (13). A cDNA library was constructed from adult A.cantonensis poly (A) + RNA in A.GT 11 following standard procedures (21) using an Amersham cDNA construction kit with random hexamer * To whom correspondence should be addressed Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821 by guest on 10 April 2018

Transcript of Orphon spliced-leader sequences form part of a repetitive element ...

Page 1: Orphon spliced-leader sequences form part of a repetitive element ...

1030-1035 Nucleic Acids Research, 1995, Vol. 23, No. 6 1995 Oxford University Press

Orphon spliced-leader sequences form part of arepetitive element in Angiostrongylus cantonensisG. W. P. Joshua*, F. B. Perler1 and C. C. Wang2

Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan, Republic of China, 1New EnglandBiolabs Inc., 32 Tozer Road, Beverly, MA 01915, USA and 2University of California, San Francisco, CA 94143,USA

Received October 19, 1994; Revised and Accepted February 7, 1995 GenBank accession nos U13190and U13191

ABSTRACT

In nematodes, the 22 nucleotlde (nt) spliced leader (SL)Is normally encoded by a multi-copy, tandemly reiter-ated SL gene and is trans-spliced from SL-RNA ontothe 5' end of a subset of mRNAs. We have found thatthe SL is also encoded at multiple (>100) orphongenomic sites in the parasitic nematode Angios-trongylus cantonensis. At these sites the sequenceforms part of a 198 bp repetitive element (designatedcon-198). Transcription from two genomic elementsthat contain the con-198 sequence has been character-ised. At one element (G-2) an -850 nt RNA with aninternal SL Is transcribed. At the other (G-1), transcrip-tion takes place 3 kb downstream of the con-198sequence.

INTRODUCTION

Angiostrongylus cantonensis is a parasitic nematode normallyfound, at the adult stage, as a lung worm in rats. After oralinfection it has a migratory period that includes infection of thebrain. When it infects humans, development is arrested at thebrain and worms reside in the sub-arachnoid space, causingeosinophilic meningitis (I).

Like the mRNAs of other nematodes (2-5), kinetoplastidprotozoa (6), trematodes (7) and Euglena (8), mRNAs ofA.cantonensis include in their maturation the /ra/w-splicing of a5' untranslated leader sequence. In nematodes this process occurson a sub-set of mRNAs (3). Nematode genes may also containintrons, necessitating both cis- and rranj-splicing of mRNAs. Inall nematodes studied to date the frans-spliced leader sequence(SL) is 22 nucleotides (nt) long and is absolutely conserved(5'-GGTTTAATTACCCAAGTTTGAG-3'). A second SL se-quence has been found in Caenorhabditis elegans (9).

The mechanism of /rans-splicing has been analysed usingcell-free extracts of Ascaris suum (10-12) and the SL-gene andSL-RNA have been described for a number of nematodes (2-5)including A.cantonensis (13). In A.cantonensis the SL gene ismulti-copy and, unlike that of other nematodes, does not alternatewith the 5S RNA gene. The SL-RNA transcribed from theSL-gene is 110 nt long with the 22 nt SL sequence at the 5' end.

It possesses an SM antigen binding site and has a secondarystructure which resembles that predicted for SL-RNAs of bothkinetoplastids and nematodes (14). Thus, fran^-splicing of the SLsequence is thought to occur in A.cantonensis as it does in othernematodes.

The SL sequence of both kinetoplastids and nematodes hasbeen found at genomic locations other than the tandemlyreiterated SL-genes participating in rra/u-splicing. These havebeen termed 'orphon' sequences (15). In kinetoplastids, orphonSL sequences are associated with DNA homologous to theintergenic regions of the SL gene repeat. In addition, the SL generepeat may be interrupted by regularly interspersed transposableinserted elements; the CRE1 element in Crithidia fasciculata(16), and the SLACS (17,18) and MAE (19) elements inTrypanosoma brucei. In contrast, orphon SL sequences innematodes are not associated with other regions of the SL generepeat. In Brugia malayi, no transcription of orphon SL sequencesis detected (4) whereas in Onchocerca volvulus transcription fromseveral different DNAs with an orphon SL sequence is seen (20).

In this work we initially set out to identify /ra/w-splicedmRNAs, by screening an A.cantonensis cDNA library with theanti-sense 22 nt SL sequence to identify full length cDNAs withthe r/ww-spliced leader sequence. We report rather the identifica-tion of cDNAs with an internal rather than 5' SL sequence, theisolation of corresponding genomic elements in which the SLsequence forms part of a 198 bp repetitive element, andtranscription from these genomic elements.

MATERIALS AND METHODS

Parasites

The A.cantonensis life-cycle was maintained through Achatinafulica snails and Sprague Dawley rats. Adult A.cantonensis wereextracted from rat pulmonary blood vessels 40 days postinfection.

Nucleic acids methods

Nucleic acids were extracted as described previously (13). AcDNA library was constructed from adult A.cantonensis poly(A)+ RNA in A.GT 11 following standard procedures (21) using anAmersham cDNA construction kit with random hexamer

* To whom correspondence should be addressed

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018

Page 2: Orphon spliced-leader sequences form part of a repetitive element ...

Nucleic Acids Research, 1995, Vol. 23, No. 6 1031

primers. A genomic library was constructed from DNA restrictedpartially with Alul, Rsa\ and HaeU\. Following EcoRl linkerligation and methylation, DNA was ligated to A.GT11. Forsequencing, clones were subcloned into pBluescript KS plasmid(Stratagene) and sequenced by the dideoxy method (22).

Southern blots of genomic or cloned DNA were hybridised in50% formamide, 1 % SDS, 6 x SSC and 100 u.g/ml salmon-spermDNA at 42°C overnight. Blots were washed in 10 x SSC/0.5%SDS; 1 x SSC/0.5% SDS; 0.1 x SSC/0.5% SDS for 20 min eachsequentially at room temperature, at 37°C and at 65°C (1 x SSCwas 150 mM NaCl and 15 mM sodium citrate).

Northern blots were hybridised in 50% formamide, 2 xDenhardt's solution, 0.1% SDS, 5 x SSC and 100 u.g/ml salmonsperm DNA at 42 °C overnight and washed as above. Restrictiondigestions were performed following the manufacturer's instruc-tions (New England Biolabs). Restriction fragments were gelpurified using silica gel (GeneClean, Bio 101).

Probes were labelled with [a-32P]dCTP by random oligo-nucleotide labelling (Pharmacia). Primer extension analyses wereperformed using 1 ng T4 polynucleotide kinase labelled oligo-nucleotides indicated in figure legends and shown in Figure I.[Primers:- 6(1): 5'-GTCGTCAGAAGGCAAATATCCTGCTA-AGAGG-3'; P-4: 5'-TTTTCTTGATTGGCAGTCACAATGC-TGGATAC-3'; G-l 3630/3610: 5'-TTGAAATGATTGCAG-CTGGT-3']. These were hybridised to 10-20 [ig A.cantonensisRNA at 30° C in 40 mM PIPES (pH 6.4), 1 mM EDTA, 0.4 MNaCI and 80% formamide and extended in 50 mM Tris (pH 7.6),75 mM KC1, 3 mM MgCl2) 1 mM each dNTP and 10 mM DTTwith 200 U Moloney Murine Leukaemia Virus reverse transcrip-tase (M-MLV-RT; BRL). Extension products were fractionated in7 M urea—6% acrylamide gels and autoradiographed.

For reverse transcriptase polymerase chain reaction (RT-PCR),10 pmol 3' oligonucleotide primers indicated in figure legendsand shown in Figure 1 were reverse transcribed with 5 (igA.cantonensis RNA in 50 mM KC1,20 mM Tris (pH 8.3), 25 mMMgCl2, 1 mM each dNTP and 200 U M-MLV reverse transcrip-tase. The 5' primers were:G-2 2205/2224, 5'-GCAGGGTAACCGTTCCTGA-3';G-2 2604/2624, 5'-TTCTGTAGCTTGCACTTCCA-3';G-2 2654/2674, 5'-CATTACTGCGTGTTGTTAAT-3';G-2 2705/2725, 5'-CAGGCACTCTAAGCAACAAT-3';G-2 2755/2775, 5'-CGCTGTTCACATTCTTGCAT-3';G-2 2806/2828, 5'-TCGTAGCATTATTCTTACACAA-3'.The 3' primers were:con 54, 5'-CTTTGCTTGTGCCACGCGGGA-3';G-2 3306/3285, 5'-AAGCACCCTCTGGCGACAATTC-3';G-2 3346/3326, 5'-CAGACGACGAACGTTCACGA-3'.

After addition of 5' primers (10 pmol), reactions wereamplified with 2.5 U Taq polymerase (Amplitaq; Perkin-ElmerCetus) for 25 cycles of 94°C, 1 min; 55°C, 1 min and 72°C, 2min. Amplification products were separated in 0.8% agarose,blotted onto nylon membrane and probed with cloned 32P-con-198.

RESULTS

Identification of orphon SL genomic sequences

Initially nine cDNA clones were identified and purified afterscreening a XGT11 cDNA library (~5 x 104 p.f.u.) with ananti-sense 22 nt SL oligonucleotide. These ranged from 1.2 to 4.0

kb and shared substantial (>90%) homology on partial orcomplete sequencing. In all clones the SL 22 nt sequence was notat the 5' end. cDNA clones were not polyadenylated and did notpossess open reading frames (ORFs) coding for polypeptideslonger than 90 amino acids. These features, together with theobservation that one cDNA clone (si 6) showed 95% similarity tothe genomic clone G-2 (see below) raise the possibility that thesecDNA clones resulted from DNA contaminating the RNA usedin cDNA library construction. One cDNA, designated si 233, wasused to screen a A.GT11 genomic library (-5 x 104 p.f.u.) ofA.cantonensis DNA. Fifteen clones were identified, purified andsub-cloned in pBluescript.

Analysis of genomic clones with orphon SL sequences)

After partial or complete sequencing of four of the 15 positivegenomic clones, two clones, G-l and G-2, were selected forfurther analysis. G-2, 3.5 kb long, contained two SL sequences.The upstream of these two SL sequences had one base change, thedownstream sequence was perfectly conserved. In addition, theregion extending 125 bp upstream and 51 bp downstream of thetwo SL sequences showed 95% identity. This 198 nt sequence(i.e. including the 22 nt SL sequence) was designated con-198(Fig. 1) and was found in all cDNA (6) and genomic clones (4)that were completely or partially sequenced. In G-2 (and si 6) thesequences were designated con-198 (upstream) and con-198(downstream). In addition, the sequence 5' of the two con-198sequences in G-2 showed <30% similarity for 140 nt, butthereafter showed 85% similarity in a 900 nt region. Thus, thecon-198 sequence apparently formed part of a degenerate 1.2 kbtandem repeat which had 913 bp of unrelated sequence separatingthe two repeat units. It is worth noting that examination of the G-2sequence (GenBank accession no. U1319O) shows severalfeatures diagnostic of a retrotransposable element (shown in Fig.1). The con-198 sequence would constitute the long terminalrepeats (LTRs) terminating in 5' TG and 3' CA dinucleotides.Flanking the LTRs are imperfect 4 bp repeats (5'-AACA;3'-AAGA) resembling a target site duplication, though theseflank both sides of both con-198 sequences. Close to the 5'putative LTR there is a 20 bp region in which 16 nt arecomplementary to the 3' end of the tRNAleu found in Bacillussteamthermophilus (23). This might serve as the primer bindingsite (PBS) for first strand synthesis. Similarly, upstream of the 3'putative LTR is a 5 bp oligopurine stretch which may serve as theprimer binding site (PB+) for second strand synthesis (see Fig. 1).

The clone G-l was similar to G-2 only in the 198 nt con-198sequence (Fig. 1). This 4.2 kb genomic clone possessed onecon-198 region, close to the 5' end, in which the SL sequenceshowed two base changes compared to the conserved nematodeSL sequence. Apart from the con-198 sequence G-l showed nosimilarity to G-2 except for the 30 nt adjacent to the 3' end of thecon-198 sequence. This 30 nt region shared 85% similarity withthe equivalent 30 nt downstream of the 5' con-198 (con-198upstream) in G-2 (Fig. 1). The regions adjacent to con-198 (both5' and 3') in other sequenced genomic clones showed 89-91%similarity to regions adjacent to con-198 (upstream) in G-2. ThecDNA clones showed similar homology to con-198 (down-stream) in G-2.

Southern blot analysis of A.cantonensis DNA digested with avariety of restriction endonucleases and probed with con-198[cloned following reverse-transcriptase polymerase chain reac-

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018

Page 3: Orphon spliced-leader sequences form part of a repetitive element ...

1032 Nucleic Acids Research, 1995, Vol. 23, No. 6

G TTonrtJcrr To g a m 1 ! 1 'IUJULKTG

i nOflGJftft^ CTGJPCPC aUODLTCT C3TCKJWIT H'THmffF T3CC3W0A

GBVUMUJT CTOconrta1 cwwociar CDCOTWOT Tmtwocr ciuluwnG <TH7r»«r MfjmrrrR rnrnraar^

o-2 ojjtfotftrr axuGinnG GD111111 i i 1 I |

OJ OOWLEWT i x u v a n : IDDOTCCIG

TCTAA33VOXC W O O C T T T T I A CUCR7tftIG P f B O t n I!C OTHUGSG

•"'lUlLJfllT GCMGOCTHT A 3 U i l U i C CilPUJU'iR. QXTtTCHIC OVtl'lUlTlA CTCK3CJLA TUWTDUA

P B SG-2 CTCfftfCPCC TUdUJCA 03tfQ333A CEmOQCC OCJCCQCIC (JLAULTUUGG CR'ntJfltJA C J C U W T C J TEGWUJT AHH33WT JVW1RTTGT

Or2 flTDVULlLLr

o-2 aocnnc

lOWGWG WlftJUM.'!' Gurmuwc OTTWOMA. OUTILMC ooswmm. UI'ILUAKX VWRIUJA

i PC&POf&L KTGtGTIG TDVOU«r M7A7L7G CMTIEMET '

G-2 TCTuarilTJV GuSOCWPC G0CO3EC ATKJtnCCT CH333CITT GOWVtiUL KSKCKWT CrfI7fflIT7ir J

G-2 m n faftgj ift-ff '-H'^'iiHT LftlLftjjJC flTCTWtlCTG TGOdKITT AlTUBPf? m 'iftfl(|l-!J'|' TUIt'OmUT TCOfflTIHA C

O-2

G-2 flH3C3TCIE

0-2 2205/2224

TTU-H TQT" CTSJHUC QQOVCUQA AITVOGC

O-2 QDJOCCIT QJJCIL^l^^tJJJCD^XIJCCD'CG QJirCIMX CJLXJJQJUT T (TfTWITsTTl

G-2 TTCKurnc TOREOCC TTGJDWO1 MKSTWCC AAiiiomA m i a o LLfifluudT Micncanv

0 - 2 2 6 0 t / 2 6 2 t *"

G-2 'I'll 11^1 m r Ni l f r a r r r X3CDVCCA^OVtl'iUUT <I ' IKI ' IU ' [R WtfOtnCIC r r m r r r n r :

G - 2 2705/2725

212)G-2 flm n m r A7CHII:IT wcnmnA icicrjtrotr acarnniG

G-1

G-2 2755/2775

G-2 265*/267t

14 mi-auflgr i rtna n IT TTJOMIEIH1 AZTCTimC

0 - 2 2106/

f -j-f r ̂1 p[ * I r, I t ' OO^tnCIT

PB +CIPKXXXX, CTTUOTiC (JUMJfGi.

WTTPj' N Wr TTOCKTCOr OtftCTICGT

GdXDJJfOV (-pf H 'HjJU 'IBI'ltJ*rri7'IBI'lt<J*rri7

MKUOSMG G1WOWCR

LrtiTiirfCT^. Q33OWIA TlTllULrftr TiTiHiUiTr TDQCIC3C TIJO3?C3V T7CH33CGNUK3KJJZ Ci'lUVOCT

WCSTJGUR TtrCTITWQV C7dC3CT(IA '.,'nk-HHl'lT1G ATI?IPODr3\ TJCSGTJPC /CCPGdt lA AIL7Q.T1L7AG-1 fllrJl-^'UTT: OTKCTWC

G-1 mcicDwtr TGnrjifcic TCTJCI?C Mmumr cmmim: gaamcrr m&tatn0 - 1 3630/3610

ULLHILTIL,

Figure 1. Partial nucleotide sequences of G-2, showing the upstream and downstream con-198 sequences and intervening sequence; the cloned SL gene [pAcSL, ref(13)]; and G-1 (complete sequences are lodged in GenBank with accession nos U13191 for G-1 and Ul3190 for G-2). Identity between clones is shown by verticallines; ends of clones are shown by H. Note that nucleotide numbering, above the line in G-2 and betow the line in G-1 starts from the beginning of the con-198 sequence.In both G-1 and G-2, the con-198 sequence is boxed and the SL sequence is hatched, hi the pAcSL sequence, the 3' end of the gene is shown by an oblique arrow.Primer sites for pnmer extension and PCR (see Materials and Methods and Fig. 5) are marked and arrowed to give orientation. The transcriptional start sites in G-2and G-1 are bracketed at nucleotides 1126 and 3527 respectively and upstream CAT and TATA boxes are underlined (broken). The retrotransposon features in G-2are indicated: bold underline shows 4 bp imperfect repeat; and PBS and PB+ show putative primer binding sites.

tion (RT-PCR) using con-198 specific primers—data not shown]showed hybridisation primarily to single bands (Fig. 2, panel 3).Limited hybridisation to other bands was visible and hybridisa-tion to multiple bands occurred when DNA was digested withenzymes that cut more frequently (lanes 3,5 and 6). Probing withG-2 resulted in hybridisation to multiple bands, and probing withG-1 in hybridisation to one or two bands (Fig. 2, panels 1 and 2).Comparison of panels 1,2 and 3 in Figure 2 shows that the majorhybridisation bands detected by each of the three probes weredifferent. These differences in hybridisation indicate that the noncon-198 regions of G-1 and G-2 occupy different genomic

locations. However, DNA digested with each enzyme showed atleast one band common to all three probes, which presumablyreflects hybridisation to the con-198 sequence. Due to theintensity of the major bands, common bands are faint in somelanes. All three probes gave a radically different pattern ofhybridisation compared to the clorted SL gene [pAcSL, ref (13)]from which the f/ww-spliced SL-RNA is transcribed (Fig. 2,panel 4). Note that hybridisation and washing conditions were toostringent to allow hybridisation of the 22 nt SL sequence only.

We estimated the copy numbers of the con-198 sequence, of theintervening sequence between con-198 (upstream) and con-198

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018

Page 4: Orphon spliced-leader sequences form part of a repetitive element ...

Nucleic Acids Research, 1995, Vol. 23, No. 6 1033

i to too 1000 IOOOO Ing DNAI

a. t=:

23 1-9 4 -6 5 *4 3 -

! •

10-0 8-0 6-

5 6 7 8 9 3 4 5 6 7 8 9

Figure 2. Southern blot analysis of 10 |ig A.cantonensis genomic DNAdigested with restriction endonucleases, named above lanes \-%, or undigested(lane 9). DNA was electrophoresed in 0.8% agarose gels, transferred to nylonmembranes (SchleicherandSchuell) and probed with G-1 (panel l),G-2(panel2), 198 nt con-198 sequence (panel 3) and with the cloned SL gene pAcSL (see13) (panel 4). Hybridisations were performed as in Materials and Methods,final wash was at 65 °C in 0.1 x SSC and 0.5% SDS. Molecular weight markersare shown to the left.

(downstream) in G-2, and of the transcribed region downstreamof con-198 in G-1 (see below). These sequences were isolatedfrom the corresponding clones by restriction digestion with theappropriate enzyme (see Fig. 3) and gel-purified. The 32P-labelled probes were hybridised to known amounts of genomicDNA or plasmids containing G-1, G-2 or the con-198 sequence(Fig. 3). By assuming a genomic complexity of 8 x 107 bp (thecomplexity of C.elegans) for A.cantonensis, a copy number of-120 was estimated for the con-198 sequence. A copy number of5 was estimated for both the G-2 intervening sequence and theG-1 transcribed region. We interpret these data as showing thatthe con-198 sequence is present at multiple (— 100) genomic sites,of which we have characterised two. In G-2, the con-198sequence forms part of a degenerate 1.2 kb tandem repeat whereasin G-1 it is present as a single copy.

Transcription of genomic elements with orphon SLsequences

To confirm transcription from these genomic elements thatcontain the con-198 sequence, a series of Northern blot experi-ments was performed. Probes prepared from G-1, G-2 and thecloned con-198 sequence (see Fig. 4A) were hybridised toA.cantonensis total RNA. When the RNA was separated in 1%formaldehyde-agarose, G-2, G-2H (spanning con-198) and thecloned con-198 sequence hybridised diffusely to RNA rangingfrom 0.2 to 9.0 kb (Fig. 4B, lanes 4,5 and 6). A 0.8 kb band wasvisible in hybridisations with the G-2 probe (Fig. 4B, lane 4).Although G-1 and G-1 X (spanning the con-198 region of G-1)hybridised to a similar smear of RNA (Fig. 4B, lanes 1 and 3), a

A c

con-198

A c

G-1

Ac

G-2

Figure 3. Determination of copy numbers. One ng to 10 ng of A.cantonensis(A.c.) genomic DNA or 1 ng to 1 ng of pBluescript plasmids with G-1, G-2 orLTR sequences were dot-blotted onto nylon membrane and probed with the 198nt con-198 sequence (panel 1), or with restriction fragments encompassingtranscribed regions. For G-1, Kpn\ fragment from nucleotides 2941-4262(panel 2); for G-2, Rsal fragment from nucleotides 2007-3039 were used(panel 3). Autoradiographic signal was quantified by laser densitometry andcopy number determined assuming a genomic complexity of 8 x 107 bp forA.camonensis.

1.8 kb transcript was clearly distinguishable in the G-1 hybridisa-tion. This 1.8 kb transcript hybridised specifically to the G-1 Kprobe (Fig. 4B, lane 2) from the 3' end of the clone, which lackedthe con-198 sequence. When RNA was separated in 10%acrylamide-urea, G-1X, G-2, G-2H and the con-198 sequencehybridised to a set of four relatively low molecular weighttranscripts ranging from 240 to 800 nt (Fig. 4C, lane 3-6). Incontrast, G-1 and G-1 K showed no such hybridisation(overexposure of the G-1 probe showed faint signal to the fourlow molecular weight transcripts, data not shown). These dataindicate that the con-198 sequence is present on multiple RNAsincluding four RNAs ranging from 240 to 800 nt, whereas theregion downstream of the con-198 sequence in G-1 codes for onetranscript of 1.8 kb.

A series of primer extension experiments were performed tomore precisely analyse transcription from G-1 and G-2 (see Figs1 and 5 A for location of primers). Primer P-4 (located immediate-ly downstream of the SL sequence in con-198) yielded a numberof extension products, indicating several transcripts or con-198sequences (Fig. 5B, lane 1). A non con-198 G-2 primer, 6(1),located 520 nt upstream of the 3' con-198 in G-2 resulted in asingle extension product of 37 nt (Fig. 5B, lane 2). This mappedto 596 nt upstream of the terminal TG dinucleotide of the 3'con-198 (see Fig. 1). Note also the faint corresponding extensionproduct of -800 nt in primer extension using a primer from thecon-198 sequence in Figure 5B, lane 1. Partial dideoxy sequenc-ing of this extension product confirmed its origin from the G-2sequence (data not shown). This transcriptional start site in G-2is located 71 bp downstream of a consensus TATA box with aCAT sequence a further 34 bp upstream. This initiation site isconsistent with RT-PCR results when a series of 5' primers from370 to 900 bp upstream of the 3' con-198 sequence were used inconjunction with an anti-sense primer from the con-198 sequence(see Figs 1 and 5 A for primer sites). Primers downstream of theputative transcriptional start site yielded RT-PCR products of thepredicted size but a primer upstream of this start site yielded noproduct (Fig. 5, panel C).

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018

Page 5: Orphon spliced-leader sequences form part of a repetitive element ...

1034 Nucleic Acids Research, 1995, Vol. 23, No. 6

G-1 XG-1 K

GA

O2G 2 H

H1 2 3 4 5 6

G-1

G-2(&-* 2BZ2/349I] (O-B J2tQ/W

ft-2 Z*<H/2«I4-«-0-2 »&4/2t74-w

•-2 270S/27M -*•

0-3 2toe/t*2t-«-

• CM-94 (0*2 0 2 1

•*•« UH/3I1S

B

H=

< -> T J 5 6

Figure 4. (A) Schematic representation of G-1 and G-2 to show con-198 region(shown by oblique lines) and enzyme sites used in probe preparation. In G-l,Xho\ (X) and Kpn\ (K) were used to prepare G-l X, which spanned the LTRsequence, and G-l K, which was located downstream of the LTR sequence. InG-2, Hin Fl (H) was used to prepare G-2 H, spanning the LTR sequence (probesequences shown in bold). (B) Northern blot analysis of A.awtonensis totalRNA electrophoresed in 1% formaldehyde-agarose. (C) A.cantonensis RNAelectrophoresed in 10% acrylamide-urea. In panels B and C RNAs weretransferred to nylon membrane and probed with G-l (lane 1), G-l K (lane 2),G-1 X (lane 3), G-2 (lane 4), G-2 H (lane 5) and con-198 (lane 6). Hybridisationand washing conditions were as described in Materials and Methods.Molecular weight markers are shown to the left of panels B and C.

1 2 3 4 5 6 7 9 10 11 12

The 3' end of the transcript encoded by G-2 was localized byRT-PCR to a region between 110 and 150 bp downstream of theSL sequence. Primers con-54 and G-2 3306/3285, locatedrespectively adjacent to the SL sequence and at 109 bp 3' of theSL sequence, resulted in RT-PCR products of the appropriate sizewhen used with 5' primers up to 490 bp from the con-198sequence, (Fig. 5C, panels 1 and 2). In contrast, an anti-senseprimer located a further 40 bp downstream yielded no amplifica-tion products when used with the same 5' primers (Fig. 5C, panel3). RT-PCR using primers complementary to the opposite strand(complementary to 3' primer G-2 330673285; and 5' primer G-22705/2725 and G-2 2755/2775) resulted in no amplified products(data not shown), confirming that transcription proceeds from theSL-containing strand and not in the opposite direction.

The G-l K fragment that hybridised to a 1.8 kb transcript onNorthern blot analysis (Fig. 4) was located at the 3' end of theclone (nucleotides 2940-^1261). Thus primers from this regionwere used in primer extension analysis. A primer located fromnucleotides 3630-3610 (see Figs 1 and 5 A) resulted in a singleextension product of 75 nt (Fig. 5B, lane 3). This mapped thetranscriptional start site to -3150 nucleotides downstream of thecon-198 sequence at a position 167 bp from a consensus TATAsequence and 207 bp from a CAT box.

Figure S. (A) Primer design for primer extension and RT-PCR analyses;numbers denote distance from 5'end of either G-l or G-2 (see Fig. I for preciseposition and Materials and Methods for sequence of oligonucleotides).(B) Primer extension using P-4 primer (lane 1) i.e. sequence downstream of theSL sequence in the con-198 sequence; primer 6(1) (lane 2); i.e. sequenceupstream of con-198 (downstream) in G-2; and G-l 3630/3610 (lane 3) i.e.sequence 3 kb downstream of con-198 in G-l. (C) RT-PCR analysis of G-2transcript In panels 1, 2 and 3 the 5' primers used were located sequentiallyfurther upstream of the 3' con-198 i.e. lanes 1 and 2, G-2 2205/2224; lanes 3and 4, G-2604/2624; lanes 5 and 6, G-2 2654/2674; lanes 7 and 8, G-22705/2725; lanes 9 and 10, G-2 2755/2775; lanes 11 and 12, G-2 2806/2828.The 3' primer was con-54 (panel 1); G-2 3306/3285 (panel 2); G-2 3346/3326(panel 3). In lanes 1, 3, 5, 7, 9 and 11 M-MLV-RT was added; in lanes 2, 4, 6,8,10 and 12 M-MLV-RT was absent. Reaction products were separated in 0.8%agarose, blotted onto nylon membrane and probed with 32P-con-198. MWmarkers are shown at left

DISCUSSION

Here we describe a 198 bp repetitive element in the A. cantonensisgenome which contains the 22 nt SL sequence that is normallyencoded by the multi-copy SL gene and fraru-spliced onto asubset of mRNAs.

This repetitive element, designated con-198, is found inmultiple copies (>100) that are contiguous with at least twounlinked DNA sequences, described herein as clones G-l andG-2. The non con-198 regions of G-1 and G-2 are present in lowercopy number (~5), thus G-1 and G-2 may not represent the majorgenomic context of the con-198 sequence. However, in the

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018

Page 6: Orphon spliced-leader sequences form part of a repetitive element ...

Nucleic Acids Research, 1995, Vol. 23, No. 6 1035

context exemplified by G-2, the con-198 sequence forms part ofa 1.2 kb tandem repeat, whereas in G-1 it is present as a singlecopy. In several features the G-2 sequence encompassing the twocon-198 sequences resembles a retrotransposable element (Fig.1). However, the absence of ORFs coding for elements homolo-gous to retroviral gag, pol or env genes, the low copy number ofthe con-198 sequence in tandemly reiterated form, and thesimilarity to a degenerate tandem repeat do not allow us todesignate con-198 the LTR of a retrotransposable element.

Rather, we interpret the Northern blot analysis (Fig. 4) asindicating that the con-198 is found on multiple RNAs includinga set of four RNAs ranging from 240 to 800 nt. Thus, primerextension analysis using a primer from the con-198 sequencegave a number of extension products corresponding to RNAtranscripts from a number of DNA sequences with the con-198sequence. Primer extension using a primer upstream of thecon-198 sequence in G-2 gave one extension product correspond-ing to transcription from one DNA region with the con-198sequence, mapping the start site to 726 nt upstream of the SLsequence in con-198. Our results from RT-PCR are consistentwith this interpretation, mapping the 5' end to between 680 and970 nt upstream of the SL sequence in G-2, and mapping the 3'end to between 110 and 150 nt downstream of the SL sequence.Thus transcription from the G-2 genomic sequence results in an858-888 nt transcript with an internal copy of the SL sequence,and consensus TATA and CAT boxes 73 and 106 nt upstream ofthe proposed start site. By comparison, in the DNA exemplifiedby G-1, transcription starts ~3 kb downstream of the con-198sequence. The biological significance of multiple transcripts withan internal copy of the repetitive element designated con-198which includes the 22 nt SL sequence is unclear. Similarly,whether or not the presence of the con-198 sequence in any wayaffects downstream transcription is also unclear.

The curious feature of con-198 is the presence of the SLsequence which is normally /ra/w-spliced onto the 5' end of asubset of nematode mRNAs. Comparison of these A.cantonensiselements containing orphon SL sequences with the O.volvulusand B.malayi DNA elements containing an orphon SL sequenceshows 70% identity in the 65-100 bp flanking the SL sequence(4,20). Thus analogous elements may be found in other nematodespecies.

In kinetoplastid protozoa /nms-splicing is thought to specifythe 5' end during the processing of polycistronic precursors intomonocistronic mature mRNAs (24). Recent evidence indicatesthat it may act similarly during the processing of polycistronicRNAs in C.elegans when SL1 trans-spYices to the first (50transcript and SL 1 or SL2 fra/w-splices to 3' RNAs (25,26). It hasfurther been suggested that the leader sequence functions as anefficient translation initiation site (27), and other functions havebeen proposed (20,28). In A.cantonensis the genomic organisa-tion of orphon SL sequences within a repetitive element and thetranscription of these genomic elements either yielding tran-scripts with an integral SL sequence or transcribing downstreamof the SL sequence suggests a biological role(s) other thanfrans-splicing. This role(s) may be investigated by characterisingtranscripts with an integral SL sequence; determining whether

transcripts are translated and determining whether deletion of theSL and/or con-198 sequence affects transcription or translation.

ACKNOWLEDGEMENTS

We thank Liao Fang-Ming and Yeh Chen-Hsiung for excellenttechnical assistance, Ms Lola Wen for typing and Dr C. Fletcherfor reviewing the manuscript. This work was supported byNational Science Council, Taiwan (NSC-81-0412-B001-04).

REFERENCES

1 Muller, R. (1975) in: Worms and Disease: A Manual of MedicalHelmimhology. W. Heinemann Medical Books Ltd, London.

2 Krause, M., and Hirsh, D. (1987) Cell 49, 753-761.3 Bektesh, S., van Doren K., and Hirsh, D. (1988) Gene Dev. 2, 1277-1283.4 Takacs, A.M., Denker, J.A., Pemne, K.G., Maroney, P.A., and Nilsen, T.W.

(1988) Proc. Natl. Acad. ScL USA 85, 7932-7936.5 Nilsen, T.W., Shambaugh, J., Denker, J.A., Chubb, G., Faser, L., Putnam,

L., and Bennet, K. (1989) Mol. Cell. Biol. 9, 3543-3544.6 Boothroyd J., and Cross, G.A.M. (1982) Gene 20, 281-289.7 Rajkovic, A., Davis, R.E., Simonsen, J.N. and Rothman EM. (1990) Proc.

Natl. Acad. ScL USA 87, 8879-8883.8 Tessier L.H., Keller M , Chan R.L., Foumier R., Weil J.-H. and Imbault P.

(1991) EMBO J. 10, 2621 -2625.9 Huang X.-Y. and Hirsh D. (1989) Proc. Natl. Acad Sci. USA 86,

8640-8644.10 Maroney, P.A., Harmon, G., and Nilsen, T.W. (1990) Proc. Nail. Acad. Sci.

USA 87, 709-713.11 Hannon, GJ., Maroney, PA.. Denker, J.A., and Nilsen, T.W. (1990) Cell

61, 1247-1255.12 Hannon, GJ., Maroney, P.A , and Nilsen, T.W. (1991) J. Biol. Chem. 266,

22792-22795.13 Joshua, G.W.P., Chuang, R.Y., Cheng, S.C., Lin, S.F., Tuan, R.S., and

Wang, C.C. (1991) Mot Biochem. Parasitol. 460, 209-218.14 Bruzik, J.P., van Doren, K., Hirsh, D. and Steitz, J.A. (1983) Nature 335,

559-562.15 Nelson, R.G., Parsons, M., Barr, PJ., Stuart, K., Selkirk, M. and Agabian,

N. (1983) Cell 34,901-909.16 Gabriel A., Yen , TJ., Schwartz, D.C., Smith, C.C, Boeke, J.D.,

Sollner-Webb, B. and Cleveland, D.W. (1990) Mol. Cell Biol. 10,615-624.

17 Aksoy S., Lalor, T.M., Martin, J., Van der Ploeg, L.H.T. and Richards, F.F.(1987) EMBOJ. 6, 3819-3826.

18 Aksoy, S., Willians, S., Chang, S. and Richards, F.F. (1990) Nucleic AcidsRes. 18, 785-792.

19 Carrington, M , Roditi, I. and Williams, R.O. (1987) Nucleic Acids Res.15, 10179-10198.

20 Zeng, W.C., Alarcon, CM., and Donnelson, J.E. (1990) Mol. Cell. Biol. 10,2765-2773.

21 Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning: Alaboratory Manual. Cold Spring Harbour University Press, Cold SpringHarbour.

22 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. NatL Acad. Sci.USA 74, 5463-5467.

23 Pixa, G., Dirheimes, G. and Keith, G. (1983) Biochem. Bioph. Res. Co.112, 578-585.

24 Boothroyd, J.C. (1989) in Nucleic Acids and Molecular Biology, Eckstein,F. and Lilley, D.MJ. (eds) Springer, Berlin.

25 Spieth, J., Brooke G., Kuerston, S., Lea, K. and Blumenthal, T. (1993) Cell73,521-532.

26 Zono, D.A.R., Cheng, N.N., Blumethal, T. and Spieth, J. (1994) Nature372, 270-272.

27 Hirsh, D. (1994) Nature 372, 222-223.28 Donnelson, J.E. and Zeng, W. (1990) Parasitol. Today 6, 327-334.

Downloaded from https://academic.oup.com/nar/article-abstract/23/6/1030/2400821by gueston 10 April 2018