Large Exon Size Does Not Limit Splicing In Vivo

7
MOLECULAR AND CELLULAR BIOLOGY, Mar. 1994, P. 2140-2146 Vol. 14, No. 3 0270-7306/94/$04.00+0 Copyright C 1994, American Society for Microbiology Large Exon Size Does Not Limit Splicing In Vivo I-TSUEN CHENt AND LAWRENCE A. CHASIN* Department of Biological Sciences, Columbia University, New York, New York 10027 Received 15 September 1993/Returned for modification 22 November 1993/Accepted 30 November 1993 Exon sizes in vertebrate genes are, with a few exceptions, limited to less than 300 bases. It has been proposed that this limitation may derive from the exon definition model of splice site recognition. In this model, a downstream donor site enhances splicing at the upstream acceptor site of the same exon. This enhancement may require contact between factors bound to each end of the exon; an exon size limitation would promote such contact. To test the idea that proximity was required for exon definition, we inserted random DNA fragments from Escherichia coli into a central exon in a three-exon dihydrofolate reductase minigene and tested whether the expanded exons were efficiently spliced. DNA from a plasmid library of expanded minigenes was used to transfect a CHO cell deletion mutant lacking the dhfr locus. PCR analysis of DNA isolated from the pooled stable cotransfectant populations displayed a range of DNA insert sizes from 50 to 1,500 nucleotides. A parallel analysis of the RNA from this population by reverse transcription followed by PCR showed a similar size distribution. Central exons as large as 1,400 bases could be spliced into mRNA. We also tested individual plasmid clones containing exon inserts of defined sizes. The largest exon included in mRNA was 1,200 bases in length, well above the 300-base limit implied by the survey of naturally occurring exons. We conclude that a limitation in exon size is not part of the exon definition mechanism. A fundamental problem in pre-mRNA splicing is the recog- nition of splice sites. Almost all introns are bounded RNA sequences that conform to 5' and 3' splice site consensus sequences (22) and are presumed to have an acceptable branch point sequence. However, this information alone is not suffi- cient to specify the actual splice sites in the large multi-intron transcripts that typify vertebrate genes. That is, there are many false sites that appear well suited to be the actual sites yet are passed over by the splicing machinery (25, 28). There must be constraints on the definition of a splice site that are imposed either by the three-dimensional structure of the pre-mRNA or by binding requirements of splicing factors that are not evident from a perusal of consensus sequences. A limitation on exon size is an example of one such constraint of the latter type. Surveys of exons in vertebrates have yielded an average internal exon size of 137 nucleotides (nt), with few internal exons surpassing a limit of 300 nt (15, 30). A mechanistic explanation for this size limitation has been advanced by the finding that the downstream donor site enhances splicing at the upstream acceptor site of the same exon (13, 19, 27). These results led Berget and her colleagues to propose the exon definition model of splicing, in which the exon (in addition to the intron) is recognized as a distinct participant in the splicing process (27). In this model, splicing factors are envisioned as binding to both the 5' and the 3' ends of an internal exon; contact between these two factors would enhance splicing at the acceptor site. The proximity that results from a size limitation on the length of an exon would promote such contact. This idea was tested by expanding the size of the second exon in a two-exon RNA molecule and noting the effect on in vitro splicing: an expansion from 300 to 600 nt reduced splicing efficiency substantially, thus supporting the linkage between exon size and splicing factor interaction (27). * Corresponding author. Mailing address: 912 Fairchild, Depart- ment of Biological Sciences, Columbia University, New York, NY 10027. Phone: (212) 854-4645. Fax: (212) 932-7616. Electronic mail address: [email protected]. t Present address: Laboratory of Molecular Pharmacology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892. In the present study, we have examined the effect of exon size on splicing in vivo. We started with a three-exon hamster dihydrofolate reductase (DHFR) minigene (dhfr) that was efficiently spliced in stable transfectants of mammalian cells; we then expanded the internal exon by inserting random sequences derived from Escherichia coli. An analysis of the RNA produced by these constructs in transfected cells showed that exons of up to 1,400 nt in length were readily spliced. We conclude that factors other than the exon definition model underlie the apparent size limit of vertebrate exons. MATERUILS AND METHODS Plasmid constructions. The dhfr minigene construct pDCHIP has been described previously (6). It contains six exons and the first intron of the gene. To construct plasmid pD2B, PCR was used to amplify a genomic DNA fragment from the cell line RDU6-2; the dhfr mutation in this cell line (36) creates a BstBI site in exon 2. One PCR primer carried an extension bearing a PstI site. The PCR product was digested with PstI to generate a 325-bp PstI fragment that was cloned into the unique PstI site in intron 1 of pDCH1P. The correct orientation and BstBI site of the resulting plasmid, pD2B, were confirmed by restriction mapping and sequencing. The struc- ture of the pD2B plasmid is very similar to that of pD22, a plasmid which was used previously (3), with the exception of the unique BstBI site in the middle exon (position 17 of the exon) and a more abbreviated intron 2 (Fig. 1B). Total DNA from E. coli DH5 was prepared (14), digested to completion with TaqI, and ligated into the phosphatase- treated BstBI site of pD2B. After transformation by electro- poration, ampicillin-resistant colonies were pooled and the heterogeneous plasmid DNA was isolated (pPOP). About 40 individual colonies from a duplicate bacterial plate were picked, grown, and screened for pD2B insert size by PCR. Briefly, 10 ,ul of each 2-ml overnight culture was centrifuged for 20 s. The drained pellets were resuspended in 10 ,ul of water and boiled for 5 min. After centrifugation for 5 min, 1 ,ul of supernatant was used for PCR. The primers used were from intron 1 and intron 2 (set 1 in Fig. IB). After determination of 2140

Transcript of Large Exon Size Does Not Limit Splicing In Vivo

Page 1: Large Exon Size Does Not Limit Splicing In Vivo

MOLECULAR AND CELLULAR BIOLOGY, Mar. 1994, P. 2140-2146 Vol. 14, No. 30270-7306/94/$04.00+0Copyright C 1994, American Society for Microbiology

Large Exon Size Does Not Limit Splicing In VivoI-TSUEN CHENt AND LAWRENCE A. CHASIN*

Department of Biological Sciences, Columbia University, New York, New York 10027

Received 15 September 1993/Returned for modification 22 November 1993/Accepted 30 November 1993

Exon sizes in vertebrate genes are, with a few exceptions, limited to less than 300 bases. It has been proposedthat this limitation may derive from the exon definition model of splice site recognition. In this model, adownstream donor site enhances splicing at the upstream acceptor site of the same exon. This enhancementmay require contact between factors bound to each end of the exon; an exon size limitation would promote suchcontact. To test the idea that proximity was required for exon definition, we inserted random DNA fragmentsfrom Escherichia coli into a central exon in a three-exon dihydrofolate reductase minigene and tested whetherthe expanded exons were efficiently spliced. DNA from a plasmid library of expanded minigenes was used totransfect a CHO cell deletion mutant lacking the dhfr locus. PCR analysis of DNA isolated from the pooledstable cotransfectant populations displayed a range of DNA insert sizes from 50 to 1,500 nucleotides. A parallelanalysis of the RNA from this population by reverse transcription followed by PCR showed a similar sizedistribution. Central exons as large as 1,400 bases could be spliced into mRNA. We also tested individualplasmid clones containing exon inserts of defined sizes. The largest exon included in mRNA was 1,200 basesin length, well above the 300-base limit implied by the survey of naturally occurring exons. We conclude thata limitation in exon size is not part of the exon definition mechanism.

A fundamental problem in pre-mRNA splicing is the recog-nition of splice sites. Almost all introns are bounded RNAsequences that conform to 5' and 3' splice site consensussequences (22) and are presumed to have an acceptable branchpoint sequence. However, this information alone is not suffi-cient to specify the actual splice sites in the large multi-introntranscripts that typify vertebrate genes. That is, there are manyfalse sites that appear well suited to be the actual sites yet arepassed over by the splicing machinery (25, 28). There must beconstraints on the definition of a splice site that are imposedeither by the three-dimensional structure of the pre-mRNA orby binding requirements of splicing factors that are not evidentfrom a perusal of consensus sequences.A limitation on exon size is an example of one such

constraint of the latter type. Surveys of exons in vertebrateshave yielded an average internal exon size of 137 nucleotides(nt), with few internal exons surpassing a limit of 300 nt (15,30). A mechanistic explanation for this size limitation has beenadvanced by the finding that the downstream donor siteenhances splicing at the upstream acceptor site of the sameexon (13, 19, 27). These results led Berget and her colleaguesto propose the exon definition model of splicing, in which theexon (in addition to the intron) is recognized as a distinctparticipant in the splicing process (27). In this model, splicingfactors are envisioned as binding to both the 5' and the 3' endsof an internal exon; contact between these two factors wouldenhance splicing at the acceptor site. The proximity that resultsfrom a size limitation on the length of an exon would promotesuch contact. This idea was tested by expanding the size of thesecond exon in a two-exon RNA molecule and noting the effecton in vitro splicing: an expansion from 300 to 600 nt reducedsplicing efficiency substantially, thus supporting the linkagebetween exon size and splicing factor interaction (27).

* Corresponding author. Mailing address: 912 Fairchild, Depart-ment of Biological Sciences, Columbia University, New York, NY10027. Phone: (212) 854-4645. Fax: (212) 932-7616. Electronic mailaddress: [email protected].

t Present address: Laboratory of Molecular Pharmacology, NationalCancer Institute, National Institutes of Health, Bethesda, MD 20892.

In the present study, we have examined the effect of exonsize on splicing in vivo. We started with a three-exon hamsterdihydrofolate reductase (DHFR) minigene (dhfr) that wasefficiently spliced in stable transfectants of mammalian cells;we then expanded the internal exon by inserting randomsequences derived from Escherichia coli. An analysis of theRNA produced by these constructs in transfected cells showedthat exons of up to 1,400 nt in length were readily spliced. Weconclude that factors other than the exon definition modelunderlie the apparent size limit of vertebrate exons.

MATERUILS AND METHODS

Plasmid constructions. The dhfr minigene constructpDCHIP has been described previously (6). It contains sixexons and the first intron of the gene. To construct plasmidpD2B, PCR was used to amplify a genomic DNA fragmentfrom the cell line RDU6-2; the dhfr mutation in this cell line(36) creates a BstBI site in exon 2. One PCR primer carried anextension bearing a PstI site. The PCR product was digestedwith PstI to generate a 325-bp PstI fragment that was clonedinto the unique PstI site in intron 1 of pDCH1P. The correctorientation and BstBI site of the resulting plasmid, pD2B, wereconfirmed by restriction mapping and sequencing. The struc-ture of the pD2B plasmid is very similar to that of pD22, aplasmid which was used previously (3), with the exception ofthe unique BstBI site in the middle exon (position 17 of theexon) and a more abbreviated intron 2 (Fig. 1B).

Total DNA from E. coli DH5 was prepared (14), digested tocompletion with TaqI, and ligated into the phosphatase-treated BstBI site of pD2B. After transformation by electro-poration, ampicillin-resistant colonies were pooled and theheterogeneous plasmid DNA was isolated (pPOP).About 40 individual colonies from a duplicate bacterial plate

were picked, grown, and screened for pD2B insert size by PCR.Briefly, 10 ,ul of each 2-ml overnight culture was centrifugedfor 20 s. The drained pellets were resuspended in 10 ,ul ofwater and boiled for 5 min. After centrifugation for 5 min, 1 ,ulof supernatant was used for PCR. The primers used were fromintron 1 and intron 2 (set 1 in Fig. IB). After determination of

2140

Page 2: Large Exon Size Does Not Limit Splicing In Vivo

SPLICING OF LARGE EXONS 2141

the size of the insert by gel electrophoresis, plasmid DNA fromeight clones containing exons with sizes of 50 to 1,500 nt wasisolated.

Transfection. Either 2 pLg of DNA from pooled plasmids(pPOP) or 50 ng of DNA from eight individual clones or theuninserted pD2B was cotransfected with an equal amount ofplasmid pNEO-BPV100 (23) into cells of the Chinese hamsterovary (CHO) cell dhfr deletion mutant DG44 (37) by thecalcium phosphate method (39). After 9 days of selection forresistance to G418 (400 pLg of the active compound per ml),colonies were pooled, expanded, and tested for plasmid inte-gration by PCR.DNA analysis. Genomic DNA from pooled cotransfectant

populations was used for PCR amplification as describedpreviously (3). Two intron primers (a 20-mer and a 28-mer),one beginning 222 bases upstream and the other beginning 74bases downstream of exon 2A of pD2B, were used as shown inFig. 1B, set 1. To a standard 50-,ul PCR mixture was added 4p.Ci of [32P]dATP (3,000 Ci/mmol; Amersham). PCR wasperformed for 20 to 24 cycles.RNA analysis. Total RNA was prepared (4) and treated with

DNase I under the following conditions (21). A 20-,ul reactionmixture containing 4 p.g of total RNA, 40 mM MgCl2, 20 U ofRNasin, and 6 U of DNase I (Worthington) was incubated at37°C for 60 min. The reaction was stopped by the addition of2 ,ul of 10% sodium dodecyl sulfate, and the reaction mixturewas extracted with phenol and precipitated with ethanol. TheRNA was then analyzed for the inclusion and exclusion ofexons 2A of various sizes by reverse transcription-PCR (RT-PCR) analysis as described previously (3). Three sets ofprimers (Fig. IB, sets 2 to 4) for PCR were used. The primersfor set 2 were located in exon 1, 139 bases upstream of the 3'end of the exon, and in exon 3, 66 bases downstream of the 5'end of the exon. For set 3, the exon I primer was the same, butthe 3' primer was located in exon 4, 119 bases downstream ofthe 5' end of the exon. Primer set 4 comprised the exon jointprimers containing the sequences spanning the junctions ofexon 1-exon 2A (5 '-CCAATGCTCAG/GAACGAATT-3')and exon 2-exon 2A (5'-AATTCGTTC/CTTCCACTGAG-3')of pD2B (the junctions are indicated by slashes). PCR wasperformed for 22 to 26 cycles, depending on the amount of dhfrmRNA present.

RESULTS

Experimental design. Our aim was to test the idea thatinternal exons are limited in size so as to allow contact betweensplicing factors that are bound to each end of the exon. Ourstrategy was to insert randomly chosen restriction fragmentsderived from E. coli DNA into the middle of the internal exonin a three-exon minigene. If exons larger than the limit of 300nt cannot be spliced, we would expect to see either skipping ornonsplicing of the internal exon when such constructs aretransfected into host mammalian cells (Fig. IA). Failure tosplice any particular expanded internal exon could be due tofactors other than exon size, such as the formation of second-ary structures that mask the splice sites or the chance incor-poration of competing cryptic splice site sequences. Therefore,we designed our first experiment to test a large population ofinsertions, analyzed en masse.Our starting construct, pD2B, contained an extra copy of

dhfr exon 2 together with its intron flanks inserted into the soleintron of a dhfr minigene. The resulting minigene has thestructure exon 1-intron 1-exon 2A-intron 2-exons 2 to 6, inwhich exon 2A is the extra copy of exon 2. The minigene isflanked by genomic dhfr promoter and polyadenylation re-

AMinigene

pDCH 1 P

L=~~~~~0 --II?

Bact-eriol DNA

pD2B

pD2B + Inserts

BpD2B

Bp p

12 18-4 29 ~ 213 4 15 6 .Primer90 Msets:

23

pE2

1 221345 16]4

FIG. 1. (A) Diagram of the dhfr minigenes used and the experi-mental scheme. The open boxes on the left represent exon 1, and thoseon the right represent exons 3 to 6; the filled boxes represent exon 2,either singly or duplicated. The horizontal lines represent introns; thethicker lines indicate duplicated sequences. The angled lines indicatesplicing patterns. pDCH I P is a dlhfr minigene containing a single intron(6). pD2B was derived from pDCHIP by the insertion of dhfr exon 2together with surrounding intronic sequences into intron 1. Thevertical line in the extra exon 2 (exon 2A in the text; 50 bases in length)denotes a unique BstBI site created by a single base substitution. Upontransfection of pD2B into DHFR-deficient CHO cells (DG44), exon2A is spliced into the mature mRNA as shown, resulting in an mRNAthat is 50 bases longer than that of pDCH 1 P. The diagram labeled1'pD2B + inserts" illustrates the insertion of E. coli DNA TaqIrestriction fragments into the BstBI site of exon 2A. The splicingpatterns show either correct splicing (bottom) or the skipping of exon2A (top, with a question mark) that could result from exons that aretoo large to be spliced. (B) Map of pD2B and pE2. pE2 is a cDNAversion of pD2B, with the introns removed. Arrows indicate the PCRprimers used. For pD2B, primer set I was used for genomic DNA PCRand primer sets 2 and 3 were used for RNA RT-PCR. In pE2, thearrows represent primers for RT-PCR that span the exon 1-exon 2 andexon 2A-exon 2 joints; this primer set (set 4) specifically amplifiesthose transcripts that have spliced in exon 2A.

gions. Exon 2A differs from exon 2 by a single base substitutionat position 17 of the exon that creates a unique BstBI site (Fig.IB). The BstBI site served as an insertion point for E. coliDNA. pD2B was introduced into a dhfr deletion mutant ofCHO cells (DG44; see reference 37) by cotransfection with aneo marker. Exon 2A was efficiently included in the mRNA ofthese stable transfectants (data not shown). Moreover, thesecotransfectants exhibited the purine auxotrophy characteristicof a DHFR deficiency. Since only 1 to 2% percent of wild-type

VOL. 14, 1994

Page 3: Large Exon Size Does Not Limit Splicing In Vivo

2142 CHEN AND CHASIN

DHFR activity is required for purine prototrophy (1, 7), theskipping of the 50-nt exon 2A must be limited to less than thisamount.

Expansion of the internal exon. To insert sequences ofvarious lengths into exon 2A,E. coli DNA was digested withTaqI and the resulting fragments were shotgun-cloned into thematching BstBI site of plasmid pD2B. About 20,000 trans-formed bacterial colonies were pooled; this heterogeneouspopulation was designated pPOP. The sizes of the inserts inthis plasmid population were estimated by PCR amplificationof the region spanning exon 2A (by using primer set 1, shownin Fig. IB). Gel electrophoresis produced a large number ofdiscrete bands visible over a background smear (Fig. 2A, lane2). The discrete bands may have originated from the prefer-ential growth of some clones during bacterial outgrowth and/orthe preferential amplification of a subset of inserts. Someplasmids that had not received a TaqI fragment were alsopresent, as evidenced by the relatively intense band at 373 bp,the size of the product yielded by the original plasmid pD2B(Fig. 2A, lane 4).RNA splicing in a pooled population of transfectants. Total

DNA from the plasmid library was used to transfect a cultureof the dhfr deletion mutant DG44, along with a plasmidcarrying the neo gene as a selectable marker. Approximately2,000 G418-resistant stable cotransfectants were pooled anddesignated T-POP (for transfectant population). The exon 2Aregion from the total DNA of this cell population was thenamplified as for the plasmid population above. Upon gelelectrophoresis, a range of PCR products representing exonsizes from 50 to 1,500 bases was revealed (Fig. 2A, lane 3).More than 45 bands of various intensities were detected. Thevery largest fragments (total length of 1,200 to 1,800 bp)produced only weak bands, possibly because of relatively poorPCR amplification. Since 2,000 transfectant colonies werepooled to generate this population, it is not clear why only 45discrete bands were found.

Total cellular RNA from this transfectant pool was purifiedand reverse transcribed, and the resultantcDNA was amplifiedby PCR (RT-PCR) using primers from exons 1 and 4 (Fig. IB,set 3). Gel electrophoresis of the products showed a widedistribution of sizes, from 418 to about 1,800 bp (Fig. 2A, lane6). Overall, the distribution of mRNA sizes among the approx-imately 30 bands reflected the distribution of insert sizes seenin the DNA and corresponded to expanded exon 2A sizes of upto 1,400 bases. The similar distribution of sizes in PCRproducts originating from DNA and RNA suggested that exonsize in this range did not inhibit exon splicing.The RT-PCR products amplified with primers from exons 1

and 4 probably represent spliced mRNA. However, moleculeslarger than 720 bp (the size of a partially spliced molecule withno insert that retained intron 1) could also represent unsplicedRNA. To amplify selectively those molecules that had correctlyspliced exon 2A to its neighboring exons, we repeated theRT-PCR procedure using primers that spanned the exon-exonjoints. The positions of these primers are indicated as set 4 inFig. I B on a template of the intronless plasmid pE2. Thespecificity of this primer pair was demonstrated by its ability toyield the expected PCR product of 72 bp with pE2 as atemplate and its inability to use as a template eitherpDCHIP1O (a minigene containing intron 1 as the sole intron)or pD2B (Fig. 2B, lanes 3 to 5). When T-POP RNA was usedas a template for RT-PCR, these joint primers generated awide range of products with a molecular size of up to 1,200 bp(Fig. 2B, lane 2), corresponding to exons of up to this size. Thisresult reproduced the overall size distribution produced by theprimers in exons I and 4 described above and verified that

+ Q4Qx4 B̂:..I

"I . X-- k

A:L. a.. -~z :::

Mi.I_

B

-

=5X.. -40

*.s-,..

_L 1-

4F --

*AM,_

E...

.6..,...

..> - am

1 2 3 4 5 6 7 8

II8- 6

12 34 56FIG. 2. Size distribution of expanded exon 2A sequences in DNA

and RNA in CHO cells transfected with a plasmid population. pPOPindicates the plasmid pool containing minigenes with exon 2A insertsof various sizes; T-POP indicates DNA or RNA from CHO cells stablycotransfected with pPOP DNA and a neo marker and selected forG418 resistance. (A) PCR analysis of DNA and RNA. See Fig. lB forthe primers used. Lane 1, molecular size markers of end-labeled4XX174 HaeIII fragments, indicated by dashes (1,353, 1,078, 872, and603 bp). Lane 2, DNA from the plasmid library of exon 2A inserts.Lane 3, DNA from the transfected CHO cell population. Lane 4, DNAfrom the parental (uninserted) plasmid DNA. Lane 5, DNA from theCHO DG44 host cell, a dhfr deletion mutant. Lane 6, RNA from thetransfected CHO cell population. Lane 7, DNA from pDCHIP10, anintronless dhfr minigene, and the derivative pE2, an intronless mini-gene that contains a tandem duplication of exon 2, used here as sizemarkers. Lane 8, RNA as in lane 6, but no RT used (control). Lanes2 to 5 show PCR products of DNA, obtained with using primers fromintrons 1 and 2 (Fig. IB, set 1); the PCR products contain exon 2A plus323 bp of intron flanks. Lanes 6 and 8 show RT-PCR products of RNA,obtained with primers from exons 1 and 4 (Fig. IB, set 3); the PCRproducts contain exon 2A plus 418 bp from adjacent exons.pDCHIP1O and pE2 are molecular size markers which representexclusion (418 bp) and inclusion (468 bp) of exon 2A. After separationby5% native polyacrylamide gel electrophoresis, the dried gels weresubjected to autoradiography (autoradiograms shown here). Frag-ments of up to 1.8 kb can be seen in DNA and RNA from T-POP. (B)PCR analysis of RNA and plasmid DNA with joint primers thatspecifically amplify molecules with a spliced exon 2A. The jointprimers are depicted as set 4 in Fig. lB. Lanes 1 and 6,XX174 HaeIIIfragments used as molecular size markers, indicated by dashes (1,353,1,078, 872, 603, 310, 281, 271, 234, 194, 118, and 72 bp). Lane 2, RNAfrom the transfected CHO cell population. Lane 3, plasmid pE2 DNA,used as a size marker and as a positive control for the joint primers.Lane 4, plasmid pDCH1P1O DNA, containing intron 1. Lane 5,plasmid pD2B DNA. Lanes 4 and 5 contain negative controls demon-strating the specificity of the joint primers for the removal of introns.Samples in lanes 1 and 2 were separated by 5% polyacrylamide gelelectrophoresis and those in lanes 3 to 6 were separated by 6% nativepolyacrylamide gel electrophoresis. The PCR products contain exon2A plus 22 bp from the adjacent exons. It can be seen that centralexons (exons 2A) as large as 1,200 bases were spliced into mRNA asshown in lane 2.

MOL. CELL. BIOL.

Page 4: Large Exon Size Does Not Limit Splicing In Vivo

SPLICING OF LARGE EXONS 2143

A~~N B i BA ~~~DNA RNA

+E: c C2MZ

* * eMIF

ez ~ ~ ~ ~ C Cb¢ O-4 c 04b qb 4b mb lcz b4es

fv 4! * "w o q "

m E4rq e->

4 k.4 4 [4 14M -

:1RNA

M

I_

_ _

118-

.w

118-- 72 5

1 2 3 456789101112 1 2 3 4 5 6 7 8 91011 1 2 3 4 5 6 7 8 9 10 11 1213FIG. 3. DNA and RNA contents of cells transfected with minigenes carrying internal exons of defined sizes. CHO DG44 cells were

cotransfected with DNA from individual plasmids and a neo marker. Pools of 20 to 40 G418-resistant transfectants were collected and namedaccording to the size of the expanded exon 2A carried on the plasmid minigene. (A) PCR analysis of DNA in transfectant populations. Markers(M), primers, and gel conditions were as described in the legend to Fig. 2A. T-120 refers to a population transfected with DNA from plasmid p120,etc. The PCR products contain an expanded exon 2A plus 323 bp of intron flanks. Lane 2 contains DNA from plasmid pD2B as a size marker (373bp). (B) RT-PCR analysis of RNA. Primers from exon 1 and exon 3 were used (Fig. 1B). The PCR products contain exon 2A plus 260 bp fromadjacent exons. Lane 2 contains DNA from the intronless minigenes carried by plasmids pDCHlP10 and pE2 as molecular size markers (260 and310 bp, respectively). The molecular size markers, indicated by dashes, are 1353, 1078, 872, 603, 310, 281, and 271 bp. (C) RT-PCR analysis ofRNA. The joint primers were used. PCR products contain exon 2A plus 22 bp from adjacent exons. Lanes 2 and 12 contain DNA from plasmidpD2B as a molecular size marker. Bands marked X in lanes 2 and 12 represent artifacts (minor contaminants). Samples in lane 11 to 13 wereseparated by 7% denaturing polyacrylamide gel electrophoresis. Samples in lanes 12 and 13 are the same as those in lanes 2 and 3, respectively.

C

internal exons as large as 1,200 bases had been correctlyincluded in a spliced product.

Splicing of RNA produced by individual plasmid clones. Toobtain a more detailed documentation of the ability of largeexons to be spliced, we screened individual plasmid clones toisolate minigenes containing expanded exons 2A of definedsizes. DNA preparations from each of eight such plasmidswere used to cotransfect DG44 cells as described above, and 20to 40 G418-resistant transfectants were pooled from eachtransfection. Each plasmid was named according to the size ofits exon 2A (e.g., p120) and each transfectant population wassimilarly named (e.g., T-120). PCR analysis of the DNA fromthe transfectant populations produced bands of the expectedsizes (Fig. 3A). RT-PCR of total RNA from each transfectantpopulation yielded products of the size expected for splicedmRNA molecules in all but one case. p1500 produced nocorrectly spliced mRNA, yielding instead a molecule that hadskipped exon 2A (Fig. 3B, lane 11). p1500 was the only cloneof the eight to produce an exon-skipping phenotype. plOOOproduced two mRNA molecules: a correctly spliced minorspecies (Fig. 3B, lane 9, arrowhead) and a major species of asmaller size, probably the result of cryptic splicing within theinsert. p1200 also produced two RNA molecules: a majorspecies representing correctly spliced mRNA (Fig. 3B, lane 10,

arrowhead) and a minor species of a larger molecular size. Thesize of the latter species is compatible with its retention of anintron. Consistent with these interpretations, both plOOO andp1200 produced only bands of the expected sizes for correctlyspliced products when the joint primers were used in theRT-PCR (Fig. 3C, lanes 8 and 9). As expected, no RT-PCRproduct was found when the joint primers were used for p1500,since exon 2A is skipped in this transcript. These results withindividual plasmids (summarized in Table 1) confirm theconclusion reached on the basis of the population studies, i.e.,that internal exons as large as 1,200 bases can be correctlyspliced.

DISCUSSION

The exon definition theory for pre-mRNA splicing proposesthat splice sites are first recognized by factors that bind to thetwo ends of internal exons; only then does a second step ofintron definition ensue, with the ends of the exons broughttogether for the covalent splicing events (27). This idea issupported by clear evidence of in vitro stimulation of spliceo-some formation and upstream intron splicing by a downstream5' splice (donor) site (19, 24, 27, 33). In vivo data supportingthe exon definition model come from the analysis of splicing

VOL. 14, 1994

Page 5: Large Exon Size Does Not Limit Splicing In Vivo

2144 CHEN AND CHASIN

TABLE 1. Summary of exon sizes in DNA and RNA from cotransfectant pools

Size (bases) obtained:

Plasmid For DNA With RNA exon primers" With RNA joint primers"exon' RT-PCR products" Exon' RT-PCR products' Exon5

pD2B 50 310 50 72 50NS5" 50 260 0 (skipped)pPOP 50-1,500 418-1,800 0-1,380 72-1,200 50-1,180

(many bands) (many bands) (many bands)p120 120 375 115 140 120p350 350 600 340 370 350p365 365 620 360 385 360p440 440 680 420 465 440p680 680 910 650 710 690pI)00 1,000 99( 730 1,000 980

1,250 990p1200 1,200 1,450 1,190 1,205 1,180

1,720 1,460p1500 1,500 260 0 (skipped) 0 (no band) 0

"Sizes were estimated from polyacrylamide gel electrophoresis of PCR products of genomic DNA, as shown in Fig. 3A."Primers used corresponded to sequences in exon I and exon 3 (Fig. IB, set 2), except for pPOP, for which primers in exons 1 and 4 were used (Fig. IB, set 3).'*Primers used corresponded to sequences at the exon I-exon 2A junction and exon 2A-exon 2 junction, as shown in Fig. IB, set 4." Sizes were estimated from the autoradiogram in Fig. 3B." Exon 2A sizes were calculated by subtracting a flank size of 260 bp from the RT-PCR products, with the exception of pPOP exons, which were calculated by

subtracting 418 bp.f Sizes were estimated from the autoradiogram in Fig. 3C.g Exon 2A sizes were calculated by subtracting a flank size of 22 bp from the RT-PCR products."'A splicing mutant with an exon-skipping phenotype, isolated from the CHO NB6 cell line (3).

mutants at endogenous loci: exon skipping is a predominantsplicing phenotype for mutations that border internal exons.That is, mutation at either end of an internal exon prevents itsother end from participating in splicing, despite a wild-typesequence at the unaffected end. These observations have beenmade for a large number of endogenous mammalian genes,culled from a variety of human genetic diseases (reviewed inreference 18), and for multiple examples within a single genethat have been identified by screening mammalian cell mutantsat a specific locus (1, 32).

Robberson et al. (27) have argued that the requirement forinteraction between the two ends of an exon could limit thesize of internal exons, pointing out that the vertebrate exonsrarely exceed 300 bases (15). Experimental evidence for thissize limitation was also presented: expansion of a second exonto 496 or 600 bases reduced in vitro splicing.Here we extended this test of an exon size limit to an in vivo

context and to a much larger number of constructs. A library ofE. coli sequences was used to expand the middle exon of athree-exon construct, and the splicing of dozens of thesemolecules in permanent transfectants of CHO cells was deter-mined. Surprisingly, we found no evidence for an exon sizelimit. Examination of a pool of molecules showed that the sizedistribution of the exons that were spliced into mRNA mir-rored the size distribution of the exons present in the DNApopulation, with exons as large as 1,400 bases included inspliced molecules. The lack of bias in favor of smaller exonsindicates that exon size is not a limiting factor in the identifi-cation of an exon for splicing, at least not in this size range.There are several possible explanations for the difference

between our results and those of Robberson et al. (27) citedabove. First, the discrepancy may be only apparent. Robbersonet al. (27) described only two constructs that produced aninhibitory effect on the splicing of two-exon molecules: splicingcould have been affected by the quality rather than the lengthof the added sequences in these molecules. Indeed, it was forthis reason that we designed our experiment to test a relatively

large number of different exon inserts. Second, splicing in vitromay be more sensitive to exon size than splicing in vivo, forreasons that need not be linked to exon definition (e.g.,because of incorrect heterogeneous nuclear ribonucleoproteincomposition or altered ratios of splicing factors). Third, theexon size effect may depend on the particular molecule beingstudied. This third point raises the possibility that our partic-ular test construct may not be subject to the exon definitionrule. Our previous results argue against such an immunity. Theexon expanded here is exon 2 of the hamster dhfr gene. CHOcell mutants that have suffered base substitution mutations atthe 3' or the 5' splice site bordering exon 2 in the endogenousdhfr gene exhibited clean and total skipping of the exon (1, 2),as expected for exon definition. Moreover, we have used a verysimilar three-exon minigene construct, containing an extraexon 2 as a middle exon, in a near-saturation mutagenesis ofthe splice sites surrounding this exon. Single base substitutionsat 14 different positions in the 3' and 5' flanks of this exonproduced a complete or partial exon-skipping phenotype (3).The construct used in the present work differed from theconstruct used previously by having a shorter intron 2: 275versus 734 bases. However, exon skipping also readily occurredin RNA produced by the construct used here, as evidenced bythe intense band representing the product of exon skipping inthe transfectants carrying the pooled plasmid population (bot-tom of lane 6 in Fig. 2A) and by the complete exon-skippingphenotype of one plasmid with a 1,500-base exon (Fig. 3B, lane11). Thus, many of the inserts did inhibit splicing here; it is justthat no correlation with exon size was seen.Although our results militate against a size limitation (of up

to 1,400 bases) dictated by an exon definition mechanism, theyin no way speak against the validity of the exon definitionmodel itself. One can easily imagine that factors bound to thetwo ends of an exon can communicate across an exon of greatsize, with the intervening exon sequences simply looped out.One might also consider that since splicing of multi-introntranscripts can proceed temporally from the interior of the

MOL. CELL. BIOL.

Page 6: Large Exon Size Does Not Limit Splicing In Vivo

SPLICING OF LARGE EXONS 2145

molecule (16, 20, 35, 38), exon definition must deal naturallywith large exons that are created by the accretion of individualexons. However, this argument is weakened by the possibilitythat exon definition could precede all actual splicing events.

Perhaps the most satisfying aspect of the idea that exondefinition limits exon size is that this notion could convenientlyexplain the limitation in exon size that has been observed innature. The average internal vertebrate exon is 137 bases long(15, 30), and very few internal exons of greater than 300 baseshave been described. These few could require exceptionalmechanisms for recognition, as might also be postulated forintrons with nonconsensus splice site sequences. If exon defi-nition does not limit exon size, then what does? The answermay lie in the evolutionary origins of exons as elements codingfor protein domains that exhibit discrete functions (8, 11).Although terminal exons are longer and more variable, theaverage length of their protein-coding sequences is in the samerange as those of internal exons (15, 34). Moreover, terminalexons carry out functions in addition to coding for proteins.The 5' untranslated region must be large enough to accommo-date sequences necessary for efficient translation initiation (17)and in some cases the regulation of translation (5, 12). The 3'untranslated region must include sequences not only forpolyadenylation but also for the regulation of mRNA stability(29) and perhaps for gene regulation (26).As discussed above, many of the insertions in the pPOP pool

did in fact interfere with splicing by causing skipping of exon2A and producing the short RT-PCR product band seen inlane 6 of Fig. 2A. Interference resulting in simple nonsplicingcould not be distinguished, since the sizes of molecules thatretained an intron would overlap with those that spliced inexons with inserts. We also obtained genetic evidence for exonskipping produced by a subset of these insertions: inserts thatcause exon 2A skipping yield active DHFR, a phenotype thatcan be selected for on the basis of purine prototrophy (3). Inone experiment of this type, DHFR-positive exon 2A-skippingtransfectants were generated at high frequency, starting withthe pPOP pool (data not shown). Thus, it is evident thatforeign RNA sequences can poison the splicing process, as hasbeen noted previously by others (9, 10, 31). Inserts of foreignRNA designed to sequester splice sites in RNA secondarystructures can inhibit splicing (9, 31), and one can easilyimagine that this mechanism generally underlies such interfer-ence. An analysis of inserts that cause skipping of exon 2A heremay provide useful examples of how secondary structures caninterfere with splice site recognition.The ready isolation of inserts that do cause exon skipping

raises a final cautionary argument. Inhibition of splicing by aset of lengthy sequences inserted into exons does not neces-sarily mean that length per se is the determining factor. Onewould expect the chance of including a poison sequence to beproportional to the length of the sequence inserted. Thus, it isnot surprising that of the eight individual plasmids we tested,partial or complete splicing abnormalities were found only inthe three with the longest inserts (plOOO, p1200, and p1500).We do not consider this strong evidence for an effect of size,since the splicing interference may be attributable to thechance of including a poison sequence. We believe that ourevidence for the converse, i.e., that the ability of long exons tobe spliced argues against a size limitation for splicing, is moretelling.

ACKNOWLEDGMENTSThis work was supported by Public Health Service grant GM-22629

from the National Institute of General Medical Sciences. I.-T.C. was

supported by National Cancer Institute Research fellowship CA08844.

REFERENCES1. Carothers, A. M., G. Urlaub, D. Grunberger, and L. A. Chasin.

1993. Splicing mutants and their second-site suppressors at thedihydrofolate reductase locus in Chinese hamster ovary cells. Mol.Cell. Biol. 13:5085-5098.

2. Carothers, A. M., G. Urlaub, D. Mucha, D. Grunberger, and L. A.Chasin. 1989. Point mutation analysis in a mammalian gene: rapidpreparation of total RNA, PCR amplification of cDNA, and Taqsequencing by a novel method. BioTechniques 7:494-499.

3. Chen, I.-T., and L. A. Chasin. 1993. Direct selection for mutationsaffecting specific splice sites in a hamster dihydrofolate reductaseminigene. Mol. Cell. Biol. 13:289-300.

4. Chirgwin, J. M., A. E. Przybyla, R. J. MacDonald, and W. J.Rutter. 1979. Isolation of biologically active ribonucleic acid fromsources enriched in ribonuclease. Biochemistry 18:5294-5299.

5. Chu, E., D. Voeller, D. M. Koeller, J. C. Drake, C. H. Takimoto,G. F. Maley, and C. J. Allegra. 1993. Identification of an RNAbinding site for human thymidylate synthase. Proc. Natl. Acad. Sci.USA 90:517-521.

6. Ciudad, C. J., G. Urlaub, and L. A. Chasin. 1988. Deletion analysisof the Chinese hamster dihydrofolate reductase gene promoter. J.Biol. Chem. 31:16274-16282.

7. Crouse, G. F., R. N. McEwan, and M. L. Pearson. 1983. Expressionand amplification of engineered mouse dihydrofolate reductaseminigenes. Mol. Cell. Biol. 3:257-266.

8. Darnell, J. E. 1978. Implications of RNA-RNA splicing in evolu-tion of eukaryotic cells. Science 202:1257-1260.

9. Eperon, L. P., I. R. Graham, A. D. Griffiths, and I. C. Eperon.1988. Effects ofRNA secondary structure on alternative splicing ofpre-mRNA: is folding limited to a region behind the transcribingRNA polymerase? Cell 54:393-401.

10. Furdon, P. J., and R. Kole. 1988. The length of the downstreamexon and the substitution of specific sequences affect pre-mRNAsplicing in vitro. Mol. Cell. Biol. 8:860-866.

11. Gilbert, W. 1985. Genes in pieces revisited. Science 285:823-824.12. Goossen, B., and M. W. Hentze. 1992. Position is the critical

determinant for function of iron-responsive elements as transla-tional regulators. Mol. Cell. Biol. 12:1959-1966.

13. Grabowski, P. J., F.-U. H. Nasim, H.-C. Kuo, and R. Burch. 1991.Combinatorial splicing of exon pairs by two-site binding of Ulsmall nuclear ribonucleoprotein particle. Mol. Cell. Biol. 11:5919-5928.

14. Grimberg, J., S. Maguire, and L. Belluscio. 1989. A simple methodfor the preparation of plasmid and chromosomal E. coli DNA.Nucleic Acids Res. 17:8893.

15. Hawkins, J. D. 1988. A survey on intron and exon lengths. NucleicAcids Res. 16:9893-9908.

16. Kessler, O., Y. Jiang, and L. A. Chasin. 1993. Order of intronremoval during splicing of endogenous adenine phosphoribosyl-transferase and dihydrofolate reductase pre-mRNA. Mol. Cell.Biol. 13:6211-6222.

17. Kozak, M. 1987. An analysis of the 5'-noncoding sequences of 699vertebrate messenger RNAs. Nucleic Acids Res. 15:8125-8148.

18. Krawczak, M., J. Reiss, and D. N. Cooper. 1992. The mutationalspectrum of single base-pair substitutions in mRNA splice junc-tions of human genes: causes and consequences. Hum. Genet.90:41-54.

19. Kreivi, J.-P., K. Zefrivitz, and G. Akusjarvi. 1991. A Ul snRNAbinding site improves the efficiency of in vitro pre-mRNA splicing.Nucleic Acids Res. 19:6956.

20. Lang, K. M., and R. A. Spritz. 1987. In vitro splicing pathways ofpre-mRNAs containing multiple intervening sequences. Mol. Cell.Biol. 7:3428-3437.

21. Maniatis, T., E. F. Fritsch, and J. Sambroolk 1982. Molecularcloning: a laboratory manual. Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y.

22. Mount, S. M. 1982. A catalogue of splice junction sequences.Nucleic Acids Res. 10:459-472.

23. Mulligan, R. C., and P. Berg. 1980. Expression of a bacterial genein mammalian cells. Science 209:1422-1427.

24. Nasim, F. H., P. A. Spears, H. M. Hoffmann, H. Kuo, and P. J.Grabowski. 1990. A sequential splicing mechanism promotesselection of an optional exon by repositioning a downstream 5'

VOL. 14, 1994

Page 7: Large Exon Size Does Not Limit Splicing In Vivo

MOL. CELL. BIOL.

splice site in preprotachykinin pre-mRNA. Genes Dev. 4:1172-1184.

25. Oshima, Y., and Y. Gotoh. 1987. Signals for the selection of asplice site in pre-mRNA. J. Mol. Biol. 195:247-259.

26. Rastinejad, F., and H. M. Blau. 1993. Genetic complementationreveals a novel regulatory role for 3' untranslated regions ingrowth and differentiation. Cell 72:903-917.

27. Robberson, B. L., G. J. Cote, and S. M. Berget. 1990. Exondefinition may facilitate splice site selection in RNAs with multipleexons. Mol. Cell. Biol. 10:84-94.

28. Senapathy, P., M. P. Shapiro, and N. Harris. 1990. Splice junc-tions, branch point sites, and exons: sequence statistics, identifi-cation, and applications to genome project. Methods Enzymol.183:252-278.

29. Shaw, G., and R. Kamen. 1986. A conserved AU sequence fromthe 3' untranslated region of GM-CSF mRNA mediates selectivemRNA degradation. Cell 46:659-667.

30. Smith, M. W. 1988. Structure of vertebrate genes: a statisticalanalysis implicating selection. J. Mol. Evol. 27:45-55.

31. Solnick, D. 1985. Alternative splicing caused by RNA secondarystructure. Cell 43:667-676.

32. Steingrimsdottir, H., G. Rowley, G. Dorado, J. Cole, and A. R.Lehmann. 1992. Mutations which alter splicing in the humanhypoxanthine guanine phosphoribosyltransferase gene. Nucleic

Acids Res. 20:1201-1208.33. Talerico, M., and S. M. Berget. 1990. Effect of 5' splice site

mutations on splicing of the preceding intron. Mol. Cell. Biol.10:6299-6305.

34. Traut, T. W. 1988. Do exons code for structural or functional unitsin proteins? Proc. Natl. Acad. Sci. USA 85:2944-2948.

35. Tsai, M.-J., A. C. Ting, J. L. Nordstorm, W. Zimmer, and B. W.O'Malley. 1980. Processing of high molecular weight ovalbuminand ovomucoid precursor RNAs to messenger RNA. Cell 22:219-230.

36. Urlaub, G., P. J. Mitchell, C. J. Ciudad, and L. A. Chasin. 1989.Nonsense mutations in the dihydrofolate reductase gene affectRNA processing. Mol. Cell. Biol. 9:2868-2880.

37. Urlaub, G., P. J. Mitchell, E. Kas, L. A. Chasin, V. L. Funanage,T. T. Myoda, and J. L. Hamlin. 1986. The effect of gamma rays atthe dihydrofolate reductase locus: deletions and inversions. So-matic Cell Mol. Genet. 12:555-566.

38. Weil, D., S. Brosset, and F. Dautry. 1990. RNA processing is alimiting step for murine tumor necrosis factor ,B expression inresponse to interleukin-2. Mol. Cell. Biol. 10:5865-5875.

39. Wigler, M., A. Pellicer, S. Silverstein, G. Urlaub, R. Axel, and L. A.Chasin. 1979. Transformation of the APRT locus in mammaliancells. Proc. Natl. Acad. Sci. USA 76:1373-1376.

2146 CHEN AND CHASIN