Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown....
Transcript of Global analysis of trans-splicing in Drosophilaprotein-coding exons in any organism is unknown....
Global analysis of trans-splicing in DrosophilaC. Joel McManus, Michael O. Duff, Jodi Eipper-Mains, and Brenton R. Graveley1
Department of Genetics and Developmental Biology, University of Connecticut Stem Cell Institute, University of Connecticut Health Center,Farmington, CT 06030-3301
Communicated by Tom Maniatis, Columbia University Medical Center, New York, NY, June 8, 2010 (received for review April 20, 2010)
Precursor mRNA (pre-mRNA) splicing can join exons contained oneither a single pre-mRNA (cis) or on separate pre-mRNAs (trans). It isexceedingly rare to have trans-splicing between protein-codingexons and has been demonstrated for only two Drosophila genes:mod(mdg4) and lola. It has also been suggested that trans-splicingis a mechanism for the generation of chimeric RNA products con-taining sequence from multiple distant genomic sites. Becausemost high-throughput approaches cannot distinguish cis- and trans-splicing events, the extent to which trans-splicing occurs betweenprotein-coding exons in any organism is unknown. Here, we usedpaired-end deep sequencing of mRNA to identify genes that un-dergo trans-splicing in Drosophila interspecies hybrids. We did notobserve credible evidence for the existence of chimeric RNAs gen-erated by trans-splicing of RNAs transcribed from distant genomicloci. Rather, our data suggest that experimental artifacts are thesource of most, if not all, apparent chimeric RNA products. We did,however, identify 80 genes that appear to undergo trans-splicingbetween homologous alleles and can be classified into three cate-gories based on their organization: (i) genes with multiple 3′ termi-nal exons, (ii) genes with multiple first exons, and (iii) genes withvery large introns,oftencontainingothergenes.Our results suggestthat trans-splicing between homologous alleles occurs more com-monly in Drosophila than previously believed and may facilitateexpression of architecturally complex genes.
chimeric RNA | RNA-seq | genomics | bioinformatics | deep sequencing
Precursor mRNA (pre-mRNA) splicing is an essential processin eukaryotic gene expression. Splicing can occur either within
a single pre-mRNA (in cis) or between two different pre-mRNAs(in trans) (1, 2). The best-characterized form of trans-splicingoccurs commonly in nematodes and trypanosomes. In theseorganisms, spliced-leader RNAs are added to the 5′ ends of many,if not all pre-mRNAs (3, 4). Examples of trans-splicing that do notinvolve spliced-leader RNAs, but rather occur between codingexons, are exceedingly rare, and only two Drosophila genes areknown to be trans-spliced: mod(mdg4) (5, 6) and lola (7).The Drosophila genes mod(mdg4) and lola both contain com-
mon 5′ exons and multiple alternative 3′ terminal exons. Althoughthe exons of mod(mdg4) are encoded on both DNA strands (5,6), and therefore require trans-splicing, all of the lola exons areencoded on the same DNA strand (7), suggesting that they arecis-spliced. However, interallelic complementation studies havedemonstrated that at least some lola isoforms are generated bytrans-splicing (7). This finding demonstrates that trans-splicedgenes cannot be identified based on their genomic organizationalone, and raises the possibility that other Drosophila genes coulduse trans-splicing for mRNA synthesis.Trans-splicing may also be a mechanism for the generation of
so-called chimeric RNAs, which contain sequences originatingfrom distant genomic loci (8). However, apparent chimeric RNAscan also be generated by homology-driven template switchingduring RT-PCR (9–11), and adequate controls are needed toidentify these experimental artifacts. One of the more completereports describing chimeric RNAs found an enrichment of shorthomologous sequences (SHSs) at chimeric RNA junction sites(12). Although the authors suggested that cellular RNA poly-merases switch DNA templates at SHSs (12), RT-PCR strand-switching at SHSs is a more likely explanation, given that both
reverse-transcriptase and Taq DNA polymerase are knownto strand-switch and multiple amplification cycles were used.A more recent study described the existence of several hundredchimeric RNAs in the rice transcriptome; however, control ex-periments to eliminate strand-switching as an explanation werenot provided (13).We used high-throughput sequencing of Drosophila hybrid
mRNA and a mixed mRNA-negative control sample to investi-gate the extent and specificity of trans-splicing. The trans-splicingof mod(mdg4) and lola were extremely specific, as no chimericproducts between these two genes were observed. In addition, 80other candidate trans-spliced genes were identified, 6 of whichwere validated. These unique trans-spliced genes have complexgenomic architecture, suggesting that trans-splicing may facilitateexpression of genes whose structure would otherwise pose chal-lenges to the gene-expression machinery. Finally, we report a highbackground of chimeric mRNA products in our negative controlsample, which suggests that mRNAs that appear to link distantgenomic loci likely result from experimental errors.
ResultsPaired-End mRNA-seq to Identify trans-Spliced Genes. To search foradditional trans-spliced genes, we performed paired-end deepsequencing of mRNA isolated from F1 hybrid progeny generatedfrom crossing Drosophila melanogaster females to Drosophilasechellia males (Fig. 1). These species were chosen because theirgenome assemblies are of sufficient quality and these two specieshave sufficient sequence divergence (∼2–3% across annotatedgenes) to map RNA-seq reads allele-specifically. To differentiatetrans-spliced RNAs generated in the animal from chimericproducts generated through library preparation artifacts (9–11)or sequencing errors, we also sequenced a negative control li-brary prepared by mixing equal amounts of RNA isolated fromthe D. melanogaster and D. sechellia parents. We obtained 49and 54 million mate-pairs from the control and hybrid libraries,respectively. All reads were separately aligned to both the D.melanogaster and D. sechellia genomes to identify reads thatmapped perfectly (without mismatches) and uniquely to only onespecies. This alignment resulted in 9,815,247 hybrid and 9,198,164control mate-pairs, where both reads were species-specific. Mate-pairs where both reads map to the same species are referred to ascis–mate-pairs (9,678,331 hybrid and 9,069,982 control mate-pairs). In contrast, mate-pairs where each read maps to a differentspecies are referred to as trans–mate-pairs (136,916 hybrid and128,182 control mate-pairs). We next mapped the reads in the cis–and trans–mate-pairs to exons of protein-coding genes.Mate-pairsin which the two reads mapped to different exons (either within
Author contributions: C.J.M. andB.R.G. designed research;C.J.M.performed research; C.J.M.,M.O.D., and J.E.-M. contributed new reagents/analytic tools; C.J.M., M.O.D., and B.R.G. ana-lyzed data; and C.J.M. and B.R.G. wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.
Data deposition: The data reported in this paper have been deposited in the Gene Ex-pression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE20421).1To whom correspondence should be addressed. E-mail: [email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1007586107/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1007586107 PNAS | July 20, 2010 | vol. 107 | no. 29 | 12975–12979
GEN
ETICS
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0 D
ownl
oade
d by
gue
st o
n A
pril
8, 2
020
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0 D
ownl
oade
d by
gue
st o
n A
pril
8, 2
020
the same gene or between different annotated genes) were con-sidered as candidates for being generated by splicing (49.4% ofthe hybrid and 56.4% of the control mate-pairs).
Frequency and Specificity of mod(mdg4) and lola trans-Splicing. Ex-amination of the two known trans-spliced genes, mod(mdg4) andlola, revealed that this approach can indeed identify trans-splicingevents. Formod(mdg4), we obtained 50 trans–mate-pairs from thehybrid but only 2 from the control (Fig. S1). Similarly, for lola weobtained 43 trans–mate-pairs from the hybrid library and nonefrom the control library. Importantly, sixmod(mdg4) and four lolatrans-splicing events, including one previously identified event inmod(mdg4), were supported by as few as one trans–mate-pair.Previous studies demonstrated trans-splicing for only 6 of 28
and 4 of 22 known mod(mdg4) and lola 3′ terminal exon groups,and many trans-spliced products were detected only in the con-text of overexpressed trans-genes, which may not reflect naturalphenomena (5–7, 14). Our results show that 22 of 24 (92%) and12 of 17 (71%) of the expressed, annotated mod(mdg4) and lolaisoforms, respectively, have at least one trans–mate-pair or resideon the antisense strand, and are therefore trans-spliced (Fig. 2).Thus, mod(mdg4) and lola mRNAs appear to be generated al-most entirely by trans-splicing.As the mod(mdg4) and lola 3′ terminal exons all have the same
reading frame, chimeric mRNAs synthesized by trans-splicing ofmod(mdg4) common exons to lola variable exons (and vice versa)would be refractory to nonsense-mediated decay. We thereforeassessed the frequency of aberrant mod(mdg4) and lola trans-splicing by searching for mate-pairs between mod(mdg4) and lola.Importantly, we did not observe any mate-pairs from either thesame, or opposite species between mod(mdg4) and lola in the hy-brid dataset. Furthermore, although we did observe some singlemate-pairs betweenmod(mdg4) or lola and other genes, these were
more prevalent in the control (49mate-pairs) than in the hybrid (26mate-pairs), suggesting that these are most likely artifacts (TableS1). Thus, trans-splicing of mod(mdg4) and lola is highly specific.
Detection and Validation of Novel trans-Splicing Events. We nextsearched for new examples of trans-splicing within the samegene. Two thousand one hundred seventy-seven genes had atleast one trans–mate-pair and were considered candidate trans-spliced genes. However, several factors including strand-switch-ing, deep sequencing errors, or reference genome errors resultedin false-positives (Fig. S2). We therefore visually evaluated eachcandidate gene in a genome browser to remove those with po-tential false-positive signals (see Materials and Methods, Fig. S2,and Tables S2 and S3). This visual curation step resulted ina final collection of 80 trans-splicing candidate genes.We used a species-specific RT-PCR/sequencing assay (15) to
validate the existence of trans-spliced mRNAs for mod(mdg4),lola, and six candidate genes. To confirm trans-splicing, we re-quired that an RT-PCR product was obtained from the hybridRNA, but not from the individual parents or the control. Thehybrid RT-PCR products were cloned and sequenced to verifythat SNPs between the primers and the exon boundaries showeda clean transition at exon-exon junctions. Using these stringentcriteria, we confirmed trans-splicing for three undocumented iso-forms from mod(mdg4) and lola, and all of the tested candidategenes (Fig. 3 and Fig. S3).
Candidate Chimeric RNA Products Carry Hallmarks of RT-PCRArtifacts. We searched for cases of trans-splicing of exons lo-cated in different annotated genes on the same chromosome oron different chromosomes. As with mod(mdg4) and lola trans-splicing, we expect that the genes involved in any new caseswould be specific (not promiscuous), would involve splicing ofRNA derived from the transcribed strand of the annotatedisoforms, and would not involve genes from the mitochondrialgenome. Of the 128,958 pairs of genes connected by at least oneintergenic mate-pair, 74,383 (58%) had at least one mate-pairderived from the noncoding strand and 1,307 (1%) involvedgenes from the mitochondrial genome. Nearly all (54,558) ofthe remaining 54,575 gene pairs were promiscuous, in that atleast one of the genes in a pair was involved in more than oneintergenic pairing. Strikingly, 16 of the 17 coding, nonpromis-cuous intergenic pairs involved single mate-pairs between adja-cent or nested genes on the same chromosome, and none ofthese were trans–mate-pairs (opposite allele pairs), suggestingthe genes connected by these mate-pairs may be misannotated,are part of the same transcription unit, and are therefore actuallycases of intragenic cis-splicing (Table S4). The remaining coding,nonpromiscuous intergenic gene pair involves a single cis–mate-pair between two paralogs of His3 (CG33845 and CG33821) thatdiffer in sequence by a single nucleotide, suggesting that thismate-pair resulted from a sequencing error. Given these results,we next investigated whether the intergenic trans–mate-pairs inour dataset could result from strand-switching artifacts gener-ated by RT-PCR during library preparation.Strand-switching is dependent on two major factors: template
homology and concentration. We found that the most frequentcases of intergenic trans–mate-pairs involved different membersof highly homologous gene families. For example, Actin paralogslocated on different chromosomes were the most abundant in-tergenic mate-pairs in our dataset. We also observed strong cor-relations between gene template concentration, measured in totalmapped reads, and the number of intergenic trans–mate-pairs(Pearson’s r = 0.88, 0.81, for different genes on different, andthe same chromosomes, respectively). For comparison, the cor-relation between template concentration and same-gene trans–mate-pairs was relatively weak (Pearson’s r = 0.33). Finally, wecompared the tissue-specific expression patterns of the inter-
cis- or trans-spliced mRNAs
trans-spliced mRNAsor
D. sechelliaD. melanogaster D. sechelliaD. melanogaster
F1 hybrid
Paired-EndSequencing
Experiment Control
AAAAAA
AAA
AAAAAA
AAA
AAAAAA
AAAAAA
AAAAAA
AAAAAA
---
AA
AAA
AA
AAAAAA
AAAAAA
t a s li d RNA
h b id
Paired-End
AAAAAA
AAAAAAA
g
AAAAAAA
AAAA
AAAAAA
AAA
t a s-spliced mRNAsor
- oor- or trtt ansansrr sstrans-spliced mRNAs-spliced mRNAs
trtt ansrr strans-spliced mRNAss-spliced mRNAsoror
Experimental Artifacts
hybridhybrid
Paired-EndPaired-EndSequencingSequencing
AAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA
cis- or trans-spliced mRNAs
Experimental Artifacts
Fig. 1. Deep sequencing to search for trans-spliced genes and chimeric RNAs.Sequencing librarieswereprepared frompoly(A)-selectedRNAfromF1hybridsof D. melanogaster and D. sechellia, and from a mixture of parental RNA(control). Librarieswere subjected topaired-enddeepsequencing, and species-specific sequence reads were identified by comparing genomic alignments.Sequencemate-pairs inwhichboth readsmapped to the same (cis) or different(trans) species were mapped to genes to identify pairs indicative of splicing.
12976 | www.pnas.org/cgi/doi/10.1073/pnas.1007586107 McManus et al.
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0
chromosomal gene pairs to examine whether mRNAs from thesegenes were expressed in the same tissues (16). We find that∼7.4% of interchromosomal gene pairs are not coexpressed inD. melanogaster (Table S5). Together, these observations suggestthat the vast majority of intergenic trans–mate-pairs are derivedfrom RT-PCR strand-switching artifacts and sequencing errors.Thus, we do not find reliable evidence of chimeric RNA pro-duction in adult Drosophila.
DiscussionThe approach described in this study is unique in providing a ge-nome-wide survey of trans-splicing, and reveals that trans-splicingbetween protein coding exons is more widespread than previouslyappreciated. At the same time, our results indicate that tran-splicing in Drosophila is extremely specific. Interestingly, homol-ogous chromosomes are paired in Drosophila somatic cells (17)and chromosomal pairing appears to be required for efficient lolatrans-splicing (7). This suggests the possibility that chromosomalpairing may be a general requirement for efficient, specific trans-splicing between homologous genes in Drosophila.The candidate trans-spliced genes we identified can be grouped
into three categories. The first class consists of genes that containat least two alternative 3′ terminal exons, likemod(mdg4) and lola(Fig. 3A). The most notable example from this class is CG42235,in which trans–mate-pairs mapped to the CG42235-RD andCG42235-RE isoforms, both of which were validated. The secondclass contains genes with at least two alternative 5′ terminal exons,such as ome (Fig. 3B). The final category included genes with largeintrons, which frequently contain nested genes within the intron,such as Nmdmc (Fig. 3C). Intriguingly, the architecture of trans-spliced genes in each class creates obstacles for the gene-expres-sionmachinery. For example, collisions of transcription complexesmay occur in nested genes. For genes containing alternative5′ terminal exons, use of distal exons requires active repression ofproximal exons. Finally, for genes containing alternative 3′ ter-minal exons, it is necessary to actively repress all proximal 3′ splicesites, premature 3′ end formation, and transcription terminationbefore synthesis and splicing of the distal exons. In each of these
cases, trans-splicing of separate pre-mRNAs generated using dis-tinct promoters and transcription termination sites would over-come all of these obstacles.In some cases, the frequency of trans-splicing is very low. This
finding may reflect a low background of “noisy” trans-splicing ora low level of strand-switching or sequencing errors that occurredonly in the hybrid sample. Alternatively, the trans–mate-pairsfrom these genes could have resulted from cis-splicing of tran-scripts expressed in a small population of cells in which somaticrecombination has occurred between the D. melanogaster andD. sechellia alleles. Although we cannot exclude these possibili-ties, we note that our validation experiments were performedusing biological replicate samples. Thus, it seems unlikely thatthe same experimental errors or somatic recombination eventswould occur in multiple biological samples.Our approach also allowed us to evaluate the extent of strand-
switching that occurs in deep sequencing experiments. Mostcross-chromosomal trans–mate-pairs we observed result fromRT-PCR artifacts and do not represent biologically generatedchimeric mRNAs. Consequently, we do not find credible evidenceof intergenic chimeric RNA production in adult Drosophila. Be-cause we observed a large number of false-positive chimeric RNAsignals, our data further suggest that reports of chimeric RNAsshould be treated with caution, especially when the supportingdata are generated using RT-PCR. However, our results do notpreclude the existence of chimeric RNAs in other species. Forexample, exons from the mosquito bursiconmRNA were recentlyfound to be encoded on two separate chromosomes, suggestingthat trans-splicing is required for bursicon mRNA synthesis (18).Another recent report described a chimeric RNA comprised ofexons from the human JJAZ1 and JAZF1 genes located on chro-mosomes 7 and 17, respectively (15). This chimeric RNA can beformed in in vitro splicing reactions, suggesting the possibility thatthe chimeric RNA can be produced via trans-splicing in vivo.However, this result does not entirely eliminate the possibility thatthe chimeric product was produced during RT-PCR amplifica-tion. Improvements in direct RNA sequencing (19, 20) shouldeventually allow the direct detection of any genuine chimeric
lolaB
A A A A A A A A A A A AA A A A A A A
0
10
20
40
~
Variable ExonsCommon
ExonsAlternative5’ Exons
Mappedmate-pairs
Cis Mate PairsTrans Mate Pairs
AAAAAAAA
AAAAAAA
A A A A A A A A A
~
0
5
10
20
Variable ExonsCommon
Exons
mod(mdg4)A
Cis mate-pairsTrans mate-pairs
Mappedmate-pairs
5 kb
5 kb
Fig. 2. Trans-splicing of mod(mdg4) and lola. The sequencing results obtained for mod(mdg4) (A) and lola (B) are shown. The horizontal gray line separatesthe sense and antisense exons of mod(mdg4). The 3′ terminal exon groups for which deep sequencing data support trans-splicing (green), only cis-splicing(red), or are not expressed in the hybrid (gray) are shown. Isoforms for which trans-splicing was previously reported are depicted with an asterisk. The numberof cis– (red) and trans– (green) mate-pairs observed for each isoform of mod(mdg4) and lola are shown (bar graphs).
McManus et al. PNAS | July 20, 2010 | vol. 107 | no. 29 | 12977
GEN
ETICS
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0
RNAs, without the introduction of strand-switching artifacts in-herent to reverse transcription and PCR amplification.Regardless of the precise mechanism by which trans-splicing
occurs and the purpose of trans-splicing, the results presentedhere identify several additional protein-coding genes that aretrans-spliced inDrosophila. Because we have only examined trans-splicing in adult females, these results certainly underestimate thefrequency of trans-splicing. Thus, deeper sequencing to analyzetrans-splicing throughout Drosophila development will likely id-entify additional trans-spliced genes. Conducting similar experi-ments in other species, including humans, whose genomes containmany genes with long introns (e.g., c-Abl), multiple promoters(e.g., PCDHGA), and multiple 3′ terminal exons (e.g., IGHA1),may reveal that trans-splicing between protein-coding exons iseven more ubiquitous.
Materials and MethodsFlies/Crosses. Flies were reared on standard cornmeal/molasses medium at25 °C. The F1 hybrids used resulted from crossing 7 females of the D. mel-
anogaster strain 14021–0231.36 (y[1]; Gr22b[1] Gr22d[1] cn[1] CG33964[R4.2]bw[1] sp[1]; LysC[1] MstProx[1] GstD5[1] Rh6[1]) with approximately 30 malesof the D. sechellia strain 14021–0248.25 (wild-type). Only female hybrids areviable from this cross.
Library Preparation and Sequencing. mRNA sequencing libraries were per-formed to manufacturer specifications (Illumina). Total RNA was preparedfromwhole flies using TRIzol (Invitrogen) and treated with DNase I to removeany contaminating DNA. Nine micrograms of total RNA from hybrid, andcontrol (4.5 μg D. melanogaster RNA + 4.5 μg D. sechellia RNA) females wasused as input for library preparation. Poly(A)+ RNA selected using Dynalmagnetic beads (Invitrogen) was fragmented using RNA fragmentation re-agent (Ambion), and reverse-transcribed using random primers and Super-Script II (Invitrogen). The resulting cDNA was size-selected (∼370 bp) on 2%agarose (TAE) gels. Libraries were subsequently prepared for sequencingusing the Paired-end Genomic DNA Library kit (Illumina). Libraries weresequenced in six (hybrid) and four (control) lanes on an Illumina GAIIx usinga 37-cycle paired-end sequencing protocol, and one (hybrid) and two (con-trol) lanes using a 76-cycle paired-end protocol. Sequence reads from 76-cycle runs were trimmed to their first 37 bases for comparison with theother data.
Nmdmc
(A)n(A)nRel
D. mel
D. secD. m
el
D. secD. m
el
D. secMix Hyb
rid
Mix Hybrid
D. mel
D. sec
TTTAAACTATA
TTTAAACTATA
CG--------AD. melD. sec
TATCG
TATCG
CATCA
CATCA 12/1212/12
Rel
Del
D. sec
. D. mel
. Dsec
D. mel
. DecMix Hyb
rid
Mix HybDel sec
cD A
RellReellllRR
el
DD. mel
.D.. memel
D. sec
.Dsec
D.. sececD. m
ell.D
el
D.. memelel
D. secc
.Dsec
D.. sesececD. m
el.D
el
D.. memelel
D. sec
.Dsec
D.. sesececMixMixxxMixMixMixxMi Hyb
ridyb
rid
HybHybHybridid
MixMixxMixMixMixMi Hybrii
ybri
Hybrbrii
D. mell
.Del
D.. memel
D. sec
Dsec
D. sesecec
TAAAATTTAAACTATTTAAAA TATCCAAATTTcsecD. s.D ces..D CATCAA CCTACACC AAA
3
CG--------A
trans mate-pairs
C
CCGTG GTTGTACCGCG ATTGTG
D. melD. sec
(A)n
Common ExonsAlternative transcription initiation sites
(A)n
dmcNmdmccdmNmdmNm
(AAA((AA)))nnn(A)n
d
AAD. mel.
D cC
3333333333333
m -pai s
(AA((AA)))))))nnnnnn
brid
bridrid
TAAATTTTTAAACTATAAA ATTTTATCAAATTTAAACG--------CG--------D. mel.D eee... lmD
eD cD seccesDTA GATCA CATTAAT GGCATCAA CTACAC AAA
3333333333333333333333
ns sn mate-pairsaa pe p saeta irm -pai
ome
trans mate-pairs
4 5 6 7 8 9 1011 12
TAAGCTAAAC
TAAGC GTTGTACCGCGGTTGTACCGCG TAAAC
CCGTG ATTGTGTAAAC 6/63/63/6
D. mel
D. secD. m
el
D. secD. m
el
D. secMix Hyb
rid
Mix Hybrid
D. mel
D. sec
. mel
. D. sec
. D. mel
. Dsec
Dmel
D. sec
. Mix Hybrid x
HyD. mel
. D. sec
CCGTCCGC ATTGT
D melD c
t ans i s
44 55 66 77 88 99 1111
GTAAA
brid
444 555 666 777 888 999 11111
GA
GCAC
GG TG
CCGTCC GTGG ATTGTGTTGTAGG TAAACCGCCCGCGG TTGTGTTGTAAA GTGG
D. mell.D ee.. lmellDDD. sec.D eccse..DD
1111999888777666444 555
transrransa sssaararrttt nrr mate-pairsaa pe pa seta irm - i
444444444444444444444 55555555555555555 666666666666666666 7777777777777777 88888888888888888888888 9999999999999999999999 11111111111
TAAGAAAAGT CGCCGTAAAAAAAAAT ACCA
D. mell
.Del
D.. melel
D. sec
.Dsec
D.. sesececD. m
el.D
el
D.. memelel
D. sec
.Dsec
D.. secD. m
el.D
el
D.. mel
D. sec
.Dsec
D.. sesececMixxMixMixMixi Hyb
rid
Hybr
ybrid
HybHybHybridrid
MixMixMix Hybri
ybri
HybHybHybrbrii
bridid
bridrid
D. mel
.Del
D.. memelel secDD. se
cD. sese
cec
B
D. mel
D. secD. m
el
D. secD. m
el
D. secMix Hyb
rid
Mix Hybrid
D. mel
D. sec CTATGAG
CTATGAG
GTATGAG
GTATGAG
D. melD. sec
GGCGT
GGCGT
GACGT
8/8
Common Exons Alternative 3’ Terminal Exon Groups
CG42235
trans mate-pairs
A
(A)n(A)n (A)n (A)n(A)n
(RE)
GATCTGGTCT
GGTCT 8/8
(RD)
RA Isoform RB Isoform RC Isoform RD Isoform RE Isoform
D. mel
. D. sec
. D. mel
. Dec
D. mel
. D. sec
. Mix Hybrid
Mix H
id
D. mel
. D. s
t ans i
( )n
G GTATGAG
D mel. sec.
GGCGTGACGT
irs
( )n ( )n(A)n
(
GATCGGTCT
G 8/8
(i
D. mell
.Del
D.. mel
D. sec
.Dsec
D.. sesececD. m
el.D
el
D.. memelel
D. sec
.Dsec
D.. sesececD. m
el.D
el
D.. melel
D. sec
.Dsec
D.. sesececMixMixxMixMixMixMi Hyb
ridyb
rid
HybHybHybrid
MixMixMixxMixMixMixMi Hybrid
ybrid
HybHybridrid
D. mel
.Del
D.. memelel
D. sec
Dsec
D. sesecec G
CTATGAGATGATATGAGCC
TATGAGA AGGAGGTATCCC
ATGATATGAGGGTATGAGGD.. melmel..D eee... lllmDDD.. secsec..D ecceess...D
GCGTG CGG TGGACGTGACGGACGTAAA
ttrans a srra sssaaarrttt nrar mate-pairsaa pe p saeta irm - ii
(((A))A))((A(( nnnnn)(A)A)(A(( nnnn )((A)A)(A(( nnnnnn))(A))A))(A(( nnnn
(RE)R )((RE
TCTGA CGAT TAAAGTCTGGGTCTGG
TCTG TCCTGGGGG 888/88/88/8
RD(RD))(( DR
1 kb
1 kb
5 kb
2 31
3 (RE)(RD)
3 (RD)
3 6
321
3 64
1 2 3 4
1
1 3
Fig. 3. Examples of newly identified trans-spliced genes. Trans-splicing was validated using RT-PCR with primers specific to D. melanogaster (red) andD. sechellia (blue). Trans-splicing is validated by the presence of RT-PCR products when using opposite species forward and reverse primers with hybrid, butnot mixed control (Mix) cDNA. Several clones of these putative trans-splicing products were sequenced to verify a clean transition of species-specific sequencesat splicing junctions. (A) CG42235 contains a set of common 5′ exons which are trans-spliced to multiple alternative 3′ terminal exon groups. (B) The ome genehas multiple alternative transcription initiation exons which are trans-spliced to a set of common 3′ terminal exons. (C) Nmdmc is an example of trans-splicingof nested genes, as the gene Rel is located within the intron.
12978 | www.pnas.org/cgi/doi/10.1073/pnas.1007586107 McManus et al.
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0
mRNA-seq Data Analysis. Sequence image analysis was performed using theFirecrest, Bustard and GERALD programs (Illumina). Sequences were alignedseparately to both the Dmelanogaster (2006, dm3) and D. sechellia (droSec1)genome assemblies (21) using Bowtie (22). Allele-specific sequence readassignments were performed as previously described (23). Briefly, sequencereads were aligned requiring no-mismatches, and alignment results werecompared to identify sequences that aligned to only one genome andmapped to a single genomic location. The coordinates of D. sechellia-specificreads were converted to their syntenic D. melanogaster coordinates usingthe lift-over tool (http://genome.ucsc.edu). Species-specific sequence readswere mapped to all annotated exons (Flybase 5.11) using a custom perl script“exonhitter” (23).
Additional custom scripts were used to identify cis– and trans–mate-pairsand for further downstream analyses. Mate-pairs were first examined toidentify pairs whose reads mapped to different exons. These pairs were fur-ther separated into “same gene” and “different gene” categories if the endsof the pair mapped to the same or different genes, respectively. “Differentgene” read-pairs were parsed into same- and different-chromosome cate-gories, if the genes to which they mapped were located on the same or dif-ferent chromosomes. The number of mate-pairs mapping to each exon pairwere counted.
The total number of cis- and trans- “same gene” mate-pairs was calcu-lated for each gene. All genes with at least one hybrid trans–mate-pair wereconsidered as trans-splicing candidates. Custom browser tracks were gen-erated to view the location of allele-specific sequence reads and trans–mate-pairs on the University of California–Santa Cruz genome browser. The trans-splicing candidate genes were visually evaluated to identify genes whosehybrid and negative control trans–mate-pairs align to the same sets of SNPs,which is indicative of strand-switching and mapping bias because of refer-ence genome errors (Fig. S2). Candidate trans-splicing events containingthese potential sources of error were not considered further.
The mRNA-seq protocol used in this study results in sequences that are notstrand-specific (i.e., one does not know from which strand an observedmRNA-seq read was generated). However, the relative strands of each se-
quence mate-pair can be analyzed. If a mate-pair was generated froma continuous mRNA, the mate-pair reads should map to opposite strands inthe reference genome.We used this relative strand information to determinewhether the reads in putative chimeric read-pairs could both come froma protein-coding sequence. For example, if two genes were encoded on thepositive DNA strand, the reads in a chimeric mate-pair derived from thecoding sequence of both genes would align to opposite DNA strands. If thereads in a mate-pair aligned to the same DNA strand, the sequence fromone read in the pair must have originated from the noncoding strand ofa gene. We calculated the frequency of coding and noncoding mate-pairs foreach apparent chimeric junction between two different genes using a customperl script. Custom scripts were also used to identify genes with multiplechimeric junctions (gene1 to gene2, gene1 to gene3, and so forth). Tissue-specific gene expression patterns were downloaded from FlyAtlas (http://flyatlas.org/) (16) and custom perl scripts were used to compare the ex-pression patterns of genes in each potential chimeric gene pair. Genes wereconsidered to be expressed in a tissue if all four of the microarray experi-ments reported expression.
Validation. RNA from different biological replicates was reverse-transcribedusing SuperScript II RT (Invitrogen) to prepare cDNA for validation PCR.Species-specific primers (Table S6) were designed for 23 isoforms of 20candidate genes [including mod(mdg4) and lola]. Species-specific PCR am-plification was successful for 11 genes, failed completely (no product wasgenerated) for 3 genes, and was nonspecific for 6 genes. RT-PCR productsgenerated from hybrid cDNA were cloned and sequenced to verify clean SNPtransitions at exon-exon junctions (Fig. S3).
ACKNOWLEDGMENTS. We thank members of the B.R.G. laboratory fordiscussions and comments on the manuscript, Thom Theara for assistancewith the Illumina GAIIx, and the University of Connecticut Health CenterTranslational Genomics Core Facility for use of the instrument. This work wassupported by National Institutes of Health Grant GM062516 (to B.R.G.).
1. Konarska MM, Padgett RA, Sharp PA (1985) Trans splicing of mRNA precursors invitro. Cell 42:165–171.
2. Solnick D (1985) Trans splicing of mRNA precursors. Cell 42:157–164.3. Sutton RE, Boothroyd JC (1986) Evidence for trans splicing in trypanosomes. Cell 47:
527–535.4. Nilsen TW (2001) Evolutionary origin of SL-addition trans-splicing: Still an enigma.
Trends Genet 17:678–680.5. Dorn R, Reuter G, Loewendorf A (2001) Transgene analysis proves mRNA trans-
splicing at the complex mod(mdg4) locus in Drosophila. Proc Natl Acad Sci USA 98:9724–9729.
6. Labrador M, et al. (2001) Protein encoding by both DNA strands. Nature 409:1000.7. Horiuchi T, Giniger E, Aigaki T (2003) Alternative trans-splicing of constant and
variable exons of a Drosophila axon guidance gene, lola. Genes Dev 17:2496–2501.8. Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts. Nature 461:
206–211.9. Cocquet J, Chong A, Zhang G, Veitia RA (2006) Reverse transcriptase template
switching and false alternative transcripts. Genomics 88:127–131.10. Odelberg SJ, Weiss RB, Hata A, White R (1995) Template-switching during DNA
synthesis by Thermus aquaticus DNA polymerase I. Nucleic Acids Res 23:2049–2057.11. Tasic B, et al. (2002) Promoter choice determines splice site selection in protocadherin
alpha and gamma pre-mRNA splicing. Mol Cell 10:21–33.12. Li X, Zhao L, Jiang H, Wang W (2009) Short homologous sequences are strongly
associated with the generation of chimeric RNAs in eukaryotes. J Mol Evol 68:56–65.
13. Zhang G, et al. (2010) Deep RNA sequencing at single base-pair resolution reveals
high complexity of the rice transcriptome. Genome Res 20:646–654.14. Gabler M, et al. (2005) Trans-splicing of the mod(mdg4) complex locus is conserved
between the distantly related species Drosophila melanogaster and D. virilis. Genetics
169:723–736.15. Li H, Wang J, Mor G, Sklar J (2008) A neoplastic gene fusion mimics trans-splicing of
RNAs in normal human cells. Science 321:1357–1361.16. Chintapalli VR, Wang J, Dow JA (2007) Using FlyAtlas to identify better Drosophila
melanogaster models of human disease. Nat Genet 39:715–720.17. Metz CW (1916) Chromosome studies on the Diptera II. The paired association of
chromosomes in the Diptera and its significance. J Exp Zool 21:213–279.18. Robertson HM, Navik JA, Walden KK, Honegger HW (2007) The bursicon gene in
mosquitoes: An unusual example of mRNA trans-splicing. Genetics 176:1351–1353.19. Ozsolak F, et al. (2009) Direct RNA sequencing. Nature 461:814–818.20. Mamanova L, et al. (2010) FRT-seq: Amplification-free, strand-specific transcriptome
sequencing. Nat Methods 7:130–132.21. Clark AG, et al.; Drosophila 12 Genomes Consortium (2007) Evolution of genes and
genomes on the Drosophila phylogeny. Nature 450:203–218.22. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol 10:R25.23. McManus CJ, et al. (2010) Regulatory divergence in Drosophila revealed by mRNA-
seq. Genome Res 20:816–825.
McManus et al. PNAS | July 20, 2010 | vol. 107 | no. 29 | 12979
GEN
ETICS
Dow
nloa
ded
by g
uest
on
Apr
il 8,
202
0
Corrections
GENETICSCorrection for “Global analysis of trans-splicing in Drosophila,”by C. Joel McManus, Michael O. Duff, Jodi Eipper-Mains, andBrenton R. Graveley, which appeared in issue 29, July 20, 2010,of Proc Natl Acad Sci USA (107:12975–12979; first publishedJuly 1, 2010; 10.1073/pnas.1007586107).The authors note that, within the supporting information
Web link “http://intron.ccam.uchc.edu/Graveley/Publications/Publications.html” should be removed. Tables S1–S6 havebeen added to the online publication. The online version hasbeen corrected.
www.pnas.org/cgi/doi/10.1073/pnas.1304972110
IMMUNOLOGYCorrection for “Association of RIG-I with innate immunity ofducks to influenza,” by Megan R. W. Barber, Jerry R. Aldridge, Jr.,Robert G. Webster, and Katharine E. Magor, which appeared inissue 13, March 30, 2010, of Proc Natl Acad Sci USA (107:5913–5918; first published March 22, 2010; 10.1073/pnas.1001755107).The authors note that on page 5917, right column, second full
paragraph, line 12 “ 5′-GTG TAT GGA GGA AAA CCC TATTTC TTA ACT-3′ ” should instead appear as “ 5′-GTG TATGGA GGA AAA CCC TAT TCT TAA CT-3′ ”.www.pnas.org/cgi/doi/10.1073/pnas.1306250110
MEDICAL SCIENCESCorrection for “Prolonged nerve blockade delays the onset ofneuropathic pain,” by Sahadev A. Shankarappa, Jonathan H.Tsui, Kristine N. Kim, Gally Reznor, Jenny C. Dohlman, RobertLanger, and Daniel S. Kohane, which appeared in issue 43,October 23, 2012, of Proc Natl Acad Sci USA (109:17555–17560;first published October 8, 2012; 10.1073/pnas.1214634109).The authors note that the following statement should be
added as a new Acknowledgments section: “This work wassupported by National Institute of General Medical SciencesGrant GM073626 (to D.S.K.).”
www.pnas.org/cgi/doi/10.1073/pnas.1306394110
BIOCHEMISTRY, ENVIRONMENTAL SCIENCESCorrection for “Proteomic analysis of skeletal organic matrixfrom the stony coral Stylophora pistillata,” by Jeana L. Drake, TaliMass, Liti Haramaty, Ehud Zelzion, Debashish Bhattacharya,and Paul G. Falkowski, which appeared in issue 10, March 5,2013, of Proc Natl Acad Sci USA (110:3788–3793; first publishedFebruary 19, 2013; 10.1073/pnas.1301419110).The authors note that Table 1 appeared incorrectly. Within the
Name column, “CARP8” should instead appear as “CARP4,”and “CARP9” should instead appear as “CARP5.” These errorsdo not affect the conclusions of the article.
www.pnas.org/cgi/doi/10.1073/pnas.1305081110
7958–7959 | PNAS | May 7, 2013 | vol. 110 | no. 19 www.pnas.org
Table
1.Th
irty-six
predictedproteinsin
S.pistilla
taSO
MsamplesdetectedbyLC
-MS/MSan
dtheirbioinform
aticsan
alysis
Protein
Gen
eAccessionno.
Nam
eP.
dam
icornis
A.digitifera
Faviasp.
N.ve
cten
sis
P.max
ima
S.purpuratus
E.huxley
iiR.filosa
H.sapiens
T.pseudonan
a
P1g11
108
KC50
9948
Protocadherin
fat-lik
e–
++*†
‡+
+–
––
––
P2g11
187
KC49
3647
CARP4
++‡
+†‡
––
––
––
–
P3g12
510
KC34
2189
Thrombospondin
–+
+†
+–
––
––
–
P4g98
61KC34
2190
Viral
inclusionprotein
+*
++*†
+–
––
––
–
P5g11
674
KC15
0884
Hem
icen
tin
++†
+‡
+‡
++
––
+‡
–
P6g11
666
KC14
9520
Actin
+*†
+*
+*
+*
+*
+*
+*
+*
+*
+*
P7g46
01KC34
2191
Actin
+*
+*‡
+*
+*†
+*
+*
+*
+*
+*
+*
P8g96
54KC34
2192
Majoryo
lkprotein
+‡
++†‡
–+
––
––
–
P9g10
811
KC00
0002
Protocadherin
fat-lik
e–
+‡
+†‡
+‡
––
––
––
P10
g11
107
KC50
9947
Cad
herin
+*
++†‡
+–
––
––
–
P11
g13
727
KC34
2193
Actin
+*
+*†
+*
+*
+*
+*
+*
+*
+*
+*
P123
g23
85JX
8916
54—
––
+†‡
––
––
––
–
P13
g69
18KC34
2194
Sushidomain-containing
++†
+–
––
––
––
P14
g99
51KC34
2195
Colla
gen
-alpha
–+
+†
––
––
––
–
P15
g15
32KC49
3648
CARP5
–+‡
+†‡
––
––
––
–
P16
g11
702
KC34
2196
—–
+†
+*
+–
––
––
–
P17
g12
472
KC14
9521
Glyceraldeh
yde3-phosphatase
deh
ydrogen
ase
+*
+*†
+*
+*
++
++
+*
+
P18
g81
0KC34
2197
Colla
gen
-alpha
–+
+†
+–
––
––
–
P19
g20
041
KC34
2198
Contactin-associated
protein
–+
+†‡
+–
––
––
–
P20
g60
66KC34
2199
MAM
domainan
chorprotein
++†‡
+*
+–
–+
–+‡
–
P21
g18
277
KC47
9163
Zonapellucida
+*†
‡+
+*‡
+–
––
––
–
P22
g19
762
KC49
3649
—–
––
––
––
––
–
P23
g10
57KC00
0004
Protocadherin
++
+*
+†
++
––
––
P24
g15
888
KC47
9164
Vitellogen
in–
++*†
‡–
––
––
––
P25
g11
220
KC47
9165
Ubiquitin
+*
+*‡
+*
+*†
+*
+*
+*
+*
+*
+*
P26
g14
41KC47
9166
Vitellogen
in+
–+†‡
––
––
––
–
P27
g18
472
KC47
9167
Integrin-alpha
+*
++†‡
––
––
––
–
P28
g11
651
KC14
9519
Late
embryogen
esisprotein
+†
––
––
––
––
–
P29
g13
377
KC47
9168
Tubulin
-beta
+*
+*
+*†
+*
+*
+*
+*
+*
+*
+P3
0g11
056
KC00
0003
Myo
sinregulatory
lightch
ain
+*
++†‡
–+
––
––
–
P31
g20
420
KC47
9169
Neu
rexin
–+
+†‡
––
––
––
–
P32
g55
40KC47
9170
Kielin
/chordin
llke
+*†
+‡
+‡
––
––
–+‡
–
P33
g89
85KC47
9171
Flag
ellarassociated
protein
+*†
+*
+*
+–
––
––
–
P34
g17
14KC47
9172
MAM/LDLreceptordomain
containingprotein
++‡
+*†
+–
++
–+
–
P35
g73
49EU
5321
64.1
Carbonic
anhyd
rase
(STP
CA2)
+*†
+‡
+–
+–
––
+–
P36
g13
890
KC47
9173
Zonad
hesion-likeprecu
rsor
++*†
‡+
+*
–+
––
––
Returned
sequen
ceswithe-va
lues
≤10
−10arepresentedin
order
ofdecreasinge-va
lue.
“Protein
nam
e”isthebestBLA
SThitin
NCBI.“Gen
e”istheco
denumber
inourS.
pistilla
tagen
epredictionmodel.
The“+”an
d“–”representpresence
andab
sence,respective
ly,ofsimila
rsequen
cesin
comparisonspecies.
*Seq
uen
cesimila
rity
isgreater
than
70%
.†Most
simila
rsequen
cebybitscore.
‡Indicates
export
signal.
PNAS | May 7, 2013 | vol. 110 | no. 19 | 7959
CORR
ECTIONS