Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling...

6
Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xie a , Mihaela Babiceanu a,1 , Shailesh Kumar a , Yuemeng Jia a , Fujun Qin a , Frederic G. Barr b , and Hui Li a,c,2 a Department of Pathology, University of Virginia, Charlottesville, VA 22908; b Laboratory of Pathology, National Cancer Institute, Bethesda, MD 20892; and c University of Virginia Cancer Center, Charlottesville, VA 22908 Edited by Peter K. Vogt, The Scripps Research Institute, La Jolla, CA, and approved September 27, 2016 (received for review August 2, 2016) Gene fusions and fusion products were thought to be unique features of neoplasia. However, more and more studies have identified fusion RNAs in normal physiology. Through RNA sequencing of 27 human noncancer tissues, a large number of fusion RNAs were found. By analyzing fusion transcriptome, we observed close clusterings be- tween samples of same or similar tissues, supporting the feasibility of using fusion RNA profiling to reveal connections between biological samples. To put the concept into use, we selected alveolar rhabdo- myosarcoma (ARMS), a myogenic pediatric cancer whose exact cell of origin is not clear. PAX3FOXO1 (paired box gene 3 fused with fork- head box O1) fusion RNA, which is considered a hallmark of ARMS, was recently found during normal muscle cell differentiation. We per- formed and analyzed RNA sequencing from various time points during myogenesis and uncovered many chimeric fusion RNAs. Interestingly, we found that the fusion RNA profile of RH30, an ARMS cell line, is most similar to the myogenesis time point when PAX3FOXO1 is expressed. In contrast, full transcriptome clustering analysis failed to uncover this connection. Strikingly, all of the 18 chimeric RNAs in RH30 cells could be detected at the same myogenic time point(s). In addition, the seven chimeric RNAs that follow the exact transient expression pattern as PAX3FOXO1 are specific to rhabdomyosarcoma cells. Fur- ther testing with clinical samples also confirmed their specificity to rhabdomyosarcoma. These results provide further support for the link between at least some ARMSs and the PAX3FOXO1-expressing myo- genic cells and demonstrate that fusion RNA profiling can be used to investigate the etiology of fusion-gene-associated cancers. rhabdomyosarcoma | gene fusion | chimeric RNA | myogenesis | PAX3-FOXO1 C hromosomal rearrangements that lead to gene fusions are well-known cancer-causing genetic events that are actively used in clinical cancer diagnosis, and their fusion products have been shown to be effective targets of directed therapy (1, 2). Prominent examples include BCRABL in chronic myelogenous leukemia (3), with the development of imatinib as a paradigm for targeted therapy (4); the commonly occurring TMPRSS2ERG fusion in prostate cancer (5); and EML4ALK and other ALK fusions in lung cancer, which are effectively targeted by crizotinib (6, 7). The prevailing view is that gene fusions and corresponding fusion products (RNA and protein) are generated due to chro- mosomal rearrangement at the DNA level and thus are unique to cancer. Even though some gene fusions and fusion transcripts have been reported to be present in nonneoplastic samples (810), they tend to be expressed at low levels and may be from the precursors of cancer cells. We and others have shown that fusion transcripts can be detected in normal human cell lines (11) and tissues (1215) and can be produced by mechanisms other than chromosomal rearrangement. From our analysis of 210 paired-end RNA sequencing libraries from 171 nonneoplastic tissue samples covering 27 different tissues, a large number of candidate fusion transcripts were uncovered using the SOAPfuse software (15), contradictory to the conventional wisdom that chimeric RNAs are rare events in noncancer samples. When using the dataset of fusion RNAs to perform clustering analysis, we noticed that samples from the same or similar tissues tended to be grouped together. This observation led us to propose the idea of using fusion transcriptome profiling as a new approach to identify connections between biological samples. It is well known that specific oncogenic fusions are associated with specific cancer types and thus are useful diagnostic bio- markers. In some situations, the biological basis of this specificity may be rendered by the parental gene expression, as in the case of fusions involving Ig locus. In other situations, the mechanism is not entirely clear. However, recent discoveries on RNA trans- splicing in normal cells (16, 17) may shed some light on this enigma. For example, the JAZF1SUZ12 fusion transcript, which is generated in endometrial stromal sarcomas by a chromosomal translocation event, is produced by RNA trans-splicing in non- transformed, normal endometrial stromal cells (17, 18). It is possible that oncogenic gene fusions may be selected because they recapitulate processes in cell development or physiology and that normal cells expressing the signature fusion products may be the cells of origin for a particular tumor. We thus hypothesize that by examining the profile of fusion transcriptome, new insights can be gained about the cell of origin for some cancer. To test the idea, we selected alveolar rhabdo- myosarcoma (ARMS) as a model. In studies of ARMS, beyond its association with the skeletal muscle lineage, the exact cell of origin remains heavily debated (19, 20). Previously, we found that, during normal skeletal muscle differentiation (myogenesis), a PAX3FOXO1 (paired box gene 3 fused with forkhead box O1) fusion mRNA and protein are expressed in the absence of chromosomal rearrangement (21). This very fusion protein has Significance Here, we propose an approach to study connections between biological samples. By using binary input of fusion RNA expres- sion, samples of same or similar tissue origin were clustered together. The concept was then put into use to gain insights for the pediatric alveolar rhabdomyosarcoma (ARMS). We found that the signature fusion RNA for ARMS, PAX3FOXO1 (paired box gene 3 fused with forkhead box O1), and all the other chi- meric RNAs expressed in ARMS cells are expressed at the same normal myogenic time point(s). Several chimeric RNAs were further confirmed to be specifically expressed in clinical rhabdo- myosarcoma tumor cases. These results support the link between at least some ARMS tumors and the PAX3FOXO1-expressing myogenic cell. Fusion RNA profiling is a tool to investigate the etiology of fusion-gene-associated cancers. Author contributions: H.L. designed research; Z.X., Y.J., and F.Q. performed research; F.G.B. contributed new reagents/analytic tools; Z.X., M.B., S.K., Y.J., and H.L. analyzed data; and Z.X., F.G.B., and H.L. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequences reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE64032). 1 Present address: Sanofi Pasteur, Orlando, FL 32826. 2 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1612734113/-/DCSupplemental. 1312613131 | PNAS | November 15, 2016 | vol. 113 | no. 46 www.pnas.org/cgi/doi/10.1073/pnas.1612734113 Downloaded by guest on June 26, 2020

Transcript of Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling...

Page 1: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

Fusion transcriptome profiling provides insights intoalveolar rhabdomyosarcomaZhongqiu Xiea, Mihaela Babiceanua,1, Shailesh Kumara, Yuemeng Jiaa, Fujun Qina, Frederic G. Barrb, and Hui Lia,c,2

aDepartment of Pathology, University of Virginia, Charlottesville, VA 22908; bLaboratory of Pathology, National Cancer Institute, Bethesda, MD 20892;and cUniversity of Virginia Cancer Center, Charlottesville, VA 22908

Edited by Peter K. Vogt, The Scripps Research Institute, La Jolla, CA, and approved September 27, 2016 (received for review August 2, 2016)

Gene fusions and fusion products were thought to be unique featuresof neoplasia. However, more and more studies have identified fusionRNAs in normal physiology. Through RNA sequencing of 27 humannoncancer tissues, a large number of fusion RNAs were found. Byanalyzing fusion transcriptome, we observed close clusterings be-tween samples of same or similar tissues, supporting the feasibility ofusing fusion RNA profiling to reveal connections between biologicalsamples. To put the concept into use, we selected alveolar rhabdo-myosarcoma (ARMS), a myogenic pediatric cancer whose exact cell oforigin is not clear. PAX3–FOXO1 (paired box gene 3 fused with fork-head box O1) fusion RNA, which is considered a hallmark of ARMS,was recently found during normal muscle cell differentiation. We per-formed and analyzed RNA sequencing from various time points duringmyogenesis and uncovered many chimeric fusion RNAs. Interestingly,we found that the fusion RNA profile of RH30, an ARMS cell line, ismost similar to the myogenesis time point when PAX3–FOXO1 isexpressed. In contrast, full transcriptome clustering analysis failed touncover this connection. Strikingly, all of the 18 chimeric RNAs in RH30cells could be detected at the samemyogenic time point(s). In addition,the seven chimeric RNAs that follow the exact transient expressionpattern as PAX3–FOXO1 are specific to rhabdomyosarcoma cells. Fur-ther testing with clinical samples also confirmed their specificity torhabdomyosarcoma. These results provide further support for the linkbetween at least some ARMSs and the PAX3–FOXO1-expressing myo-genic cells and demonstrate that fusion RNA profiling can be used toinvestigate the etiology of fusion-gene-associated cancers.

rhabdomyosarcoma | gene fusion | chimeric RNA | myogenesis |PAX3-FOXO1

Chromosomal rearrangements that lead to gene fusions arewell-known cancer-causing genetic events that are actively

used in clinical cancer diagnosis, and their fusion products havebeen shown to be effective targets of directed therapy (1, 2).Prominent examples include BCR–ABL in chronic myelogenousleukemia (3), with the development of imatinib as a paradigm fortargeted therapy (4); the commonly occurring TMPRSS2–ERGfusion in prostate cancer (5); and EML4–ALK and other ALKfusions in lung cancer, which are effectively targeted by crizotinib(6, 7). The prevailing view is that gene fusions and correspondingfusion products (RNA and protein) are generated due to chro-mosomal rearrangement at the DNA level and thus are uniqueto cancer. Even though some gene fusions and fusion transcriptshave been reported to be present in nonneoplastic samples (8–10), they tend to be expressed at low levels and may be from theprecursors of cancer cells. We and others have shown that fusiontranscripts can be detected in normal human cell lines (11) andtissues (12–15) and can be produced by mechanisms other thanchromosomal rearrangement.From our analysis of 210 paired-end RNA sequencing libraries

from 171 nonneoplastic tissue samples covering 27 different tissues,a large number of candidate fusion transcripts were uncovered usingthe SOAPfuse software (15), contradictory to the conventionalwisdom that chimeric RNAs are rare events in noncancer samples.When using the dataset of fusion RNAs to perform clusteringanalysis, we noticed that samples from the same or similar tissues

tended to be grouped together. This observation led us to proposethe idea of using fusion transcriptome profiling as a new approachto identify connections between biological samples.It is well known that specific oncogenic fusions are associated

with specific cancer types and thus are useful diagnostic bio-markers. In some situations, the biological basis of this specificitymay be rendered by the parental gene expression, as in the caseof fusions involving Ig locus. In other situations, the mechanismis not entirely clear. However, recent discoveries on RNA trans-splicing in normal cells (16, 17) may shed some light on thisenigma. For example, the JAZF1–SUZ12 fusion transcript, whichis generated in endometrial stromal sarcomas by a chromosomaltranslocation event, is produced by RNA trans-splicing in non-transformed, normal endometrial stromal cells (17, 18). It ispossible that oncogenic gene fusions may be selected becausethey recapitulate processes in cell development or physiologyand that normal cells expressing the signature fusion productsmay be the cells of origin for a particular tumor.We thus hypothesize that by examining the profile of fusion

transcriptome, new insights can be gained about the cell of originfor some cancer. To test the idea, we selected alveolar rhabdo-myosarcoma (ARMS) as a model. In studies of ARMS, beyondits association with the skeletal muscle lineage, the exact cell oforigin remains heavily debated (19, 20). Previously, we foundthat, during normal skeletal muscle differentiation (myogenesis),a PAX3–FOXO1 (paired box gene 3 fused with forkhead box O1)fusion mRNA and protein are expressed in the absence ofchromosomal rearrangement (21). This very fusion protein has

Significance

Here, we propose an approach to study connections betweenbiological samples. By using binary input of fusion RNA expres-sion, samples of same or similar tissue origin were clusteredtogether. The concept was then put into use to gain insights forthe pediatric alveolar rhabdomyosarcoma (ARMS). We foundthat the signature fusion RNA for ARMS, PAX3–FOXO1 (pairedbox gene 3 fused with forkhead box O1), and all the other chi-meric RNAs expressed in ARMS cells are expressed at the samenormal myogenic time point(s). Several chimeric RNAs werefurther confirmed to be specifically expressed in clinical rhabdo-myosarcoma tumor cases. These results support the link betweenat least some ARMS tumors and the PAX3–FOXO1-expressingmyogenic cell. Fusion RNA profiling is a tool to investigate theetiology of fusion-gene-associated cancers.

Author contributions: H.L. designed research; Z.X., Y.J., and F.Q. performed research; F.G.B.contributed new reagents/analytic tools; Z.X., M.B., S.K., Y.J., and H.L. analyzed data;and Z.X., F.G.B., and H.L. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GeneExpression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no.GSE64032).1Present address: Sanofi Pasteur, Orlando, FL 32826.2To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1612734113/-/DCSupplemental.

13126–13131 | PNAS | November 15, 2016 | vol. 113 | no. 46 www.pnas.org/cgi/doi/10.1073/pnas.1612734113

Dow

nloa

ded

by g

uest

on

June

26,

202

0

Page 2: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

been shown to be critical in the development of the ARMS (22),in which the predominant driver mutation is a characteristic 2;13chromosomal translocation resulting in formation of the fusion.We postulate that a unique population of normal muscle pro-genitor cells is permissive for PAX3–FOXO1 expression. Thesepermissive myogenic cells represent intriguing candidates for thecells of origin for ARMS. This is a similar concept to “condi-tioned tumorigenicity,” in which the tumorigenic effect of a givenoncogene is restricted to specific and often quite narrow differ-entiation or maturation windows within each susceptible lineage(23). To further support the connection between normal myo-genesis and ARMS, we put the concept of fusion transcriptomeprofiling into use.

ResultsFusion RNA Profiling Connects Samples with Same or Similar TissueOrigin. From paired-end RNA sequencing libraries covering 27different tissues of 95 noncancer donors, we detected 16,026 fusiontranscripts using a different software, EricScript (24), which is fasterand yielded more fusion transcripts for downstream analysis. Similarto our previous findings, we found that fusion RNA profiles in eachtissue or cell type were very different (Fig. S1). We then focused on2,208 recurrent fusions that were detected in more than one sample.The majority of recurrent fusions occurred in only one or two tissuetypes (578 and 752, respectively). Only a small percent (15%) arefound in more than five tissue types (Fig. 1A).We performed unsupervised clustering analysis based on fusion

RNA expression status (present or absent) in the tissue samples.Most samples from the same tissue type were found to clustertogether (Fig. 1B). For example, all testis and placental sampleswere in their own clusters, respectively. Samples from tissues withthe same embryologic origin were also grouped together (colon,duodenum, stomach, and small intestine). Similarly, traditionaltranscriptome clustering with fragments per kilobase of transcriptper million mapped reads (FPKM) of gene expression alsogrouped most samples of the same or similar tissue type together,with a few exceptions (Fig. 1C). For instance, placenta tissues arescattered. These results suggest that the fusion RNA profiling, liketraditional transcriptome, can also be used for clustering analysis.In addition, fusion RNA profiling has the advantage of using abinary input (present or absent), whereas the level of expressionhas to be used for traditional transcriptome profiling.

Numerous Chimeric RNAs Were Transiently Expressed During Myogenesis.To apply this methodology to a more practical question, we usedthis fusion transcriptome profiling approach to investigate the cellof origin of the pediatric soft tissue tumor alveolar rhabdomyosar-coma (ARMS). Previously, we detected the PAX3–FOXO1 fusionRNA, which is considered the signature of ARMS, during normalmuscle-cell differentiation (21). To gain more insight into theconnection between normal myogenesis and ARMS, we performedRNA-sequencing (RNA-seq) at four time points during the processof mesenchymal stem cell (MSC) muscle differentiation (Fig. 2A).A total of 104 chimeric fusion RNAs were identified in samplescollected at these four different time points of myogenesis. Weidentified 33 chimeric fusion RNAs in the ARMS cell line, RH30(Fig. 2B and Table S1). Of these 137 fusion transcripts, we ran-domly selected 40 candidates and successfully validated 30 by RT-PCR and traditional Sanger sequencing (Fig. 2C and Fig. S2). Wecategorized these fusion RNAs into three groups: fusions involvingparental genes located on different chromosomes (INTERCHR),fusions involving neighboring genes transcribing the same strand(INTRACHR-SS-0GAP), and other fusions with parental genes onthe same chromosome (INTRACHR-OTHER) (Table S1). Ap-proximately half of the chimeric RNAs were in the category ofINTRACHR-SS-0GAP (72 total), which are candidates for cis-splicing between adjacent genes (cis-SAGe). The remaining 65 werepotential products of RNA trans-splicing. Of 137 chimeric RNAs,59 fusions had junction sites in the UTR region, thus not changingprotein-coding sequence (NA) (Table S1). In 40 fusions, the proteincoding sequence of the 3′ gene used a different reading frame than

the 5′ gene (frame-shift). In the remaining 36, the reading frame ofthe 3′ gene was the same as the 5′ gene (in-frame). Two fusionswere labeled “both” because of alternative splicing isoforms thatwould result in both frame-shift and in-frame. In-frame fusions weresignificantly enriched (P < 0.05).Of note, the majority of these chimeric mRNAs were only seen

at one time point during the differentiation process and were notseen in undifferentiated cells, indicating their transient nature.The detection of these fusion RNAs at different time points didnot correlate with the expression of the 5′ or 3′ parental gene,arguing against the possibility of nonspecific splicing caused byhigh-level expression of the parental transcripts (Fig. S3).

Fusion Transcriptome Analysis Revealed That RH30 and Time Point 3Have Similar Chimeric RNA Profiles. PAX3–FOXO1 was transientlydetected at time point 3 (T3) during myogenic differentiationof the MSCs. Interestingly, four other chimeras—MARS–AVIL,

Fig. 1. Fusion transcriptome profiling grouped samples of same or similartissues together. (A) A large number of fusions were identified from RNA-seqof 27 nonneoplastic human tissues. The majority of fusions are seen in one ortwo tissue types, and a small fraction is seen in more than five tissue types.(B) Unsupervised clustering of fusion RNA expression (presence or absence)grouped samples from the same tissue type together, as in the case of allplacental samples. This analysis used binary format with 1 indicating that thefusion was present (red) and 0 indicating that the fusion was absent (green).We used a variance filter value of 50. Heat maps of the “sample tree” wereobtained by “complete lineage clustering” of filtered data. The Pearson cor-relation matrix was used in this analysis. (C) Similarly, unsupervised clusteringwith traditional transcriptome expression data was performed.

Xie et al. PNAS | November 15, 2016 | vol. 113 | no. 46 | 13127

GEN

ETICS

Dow

nloa

ded

by g

uest

on

June

26,

202

0

Page 3: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

LINC0045–MEDAG, SEPT7P2–PSPH, and ENO1–EDARADD—were also found by RNA-seq in both the T3 sample and RH30,but not at any other time points (Fig. 2B). It is unlikely that thisco-occurrence is due to random chance (P = 2.7e-06), because T3has the smallest number of fusions (n = 9), compared to the threeother time points (T1, n = 24; T2, n = 49; and T4, n = 22). Anunsupervised clustering analysis based on chimeric RNA expres-sion demonstrated the closest similarity between T3 and RH30(Fig. 3A), consistent with the hypothesis that T3 has a populationof cells that are potential cells of origin for ARMS.We then used full transcriptome data to perform a similar

unsupervised clustering analysis. Overall, the samples from thefour differentiation time points were clustered closer to eachother than to RH30 (Fig. 3B). This result is not surprising, be-cause the four time points are all derived from the same MSCs.The lack of connection between T3 and RH30 could be theoverall high noise level associated with 24,632 entries (FPKM > 0 in

any of the five samples), although a selected group of 1,000transcripts that had the biggest variation among the five samplesalso failed to uncover the connection. When we examined closer,we found some common changes in gene expression in both T3and RH30 when they were compared with T1. To investigatewhether the commonality of gene expression between T3 andRH30 is due to the activity of PAX3–FOXO1, a potent tran-scription activator, we compared the list of up-regulated genesin T3 and RH30 (compared with T1) with the reported PAX3–FOXO1 targets elucidated in a chromatin immunoprecipitationsequencing study (25). Of the 1,075 up-regulated genes shared byT3 and RH30 cells, ∼9% were targets of PAX3–FOXO1 (98 of alist of 1,072; P < 0.001) (Fig. 3C), including a few genes encodingmyogenic factors that are well-known PAX3–FOXO1 down-stream targets,MYOD, EYA2, and EYA4 (25). Gene Set EnrichmentAnalysis (26) confirmed the enrichment of PAX3–FOXO1 tar-gets in the commonly up-regulated genes in T3 and RH30 (en-richment score = 0.8, P < 0.001) (Fig. 3D). These results indicatethat transcriptome analyses could also potentially reveal theconnection between T3 and RH30 cells, but requires more su-pervised data mining.

All of the Chimeric RNAs in RH30 Could Be Detected at the Time Point(s)of Myogenesis When PAX3–FOXO1 Was Expressed. In a validationexperiment, we repeated the muscle differentiation experimentusing a different MSC line, human embryonic stem cell-derivedMSCs (hES-MSCs). Considering the high false-negative rate as-sociated with RNA-seq and fusion-mining software, we performedRT-PCR analyses on the samples collected throughout the dif-ferentiation process. In this system, PAX3–FOXO1 was transientlyexpressed at mostly days 2 and 6 and was not detected at later timepoints (Fig. 4A). The disappearance at day 4 and reappearance atday 6 is consistent with the wavering expression pattern detectedin some of our previous muscle differentiation experiments (21).Of note, this wavering pattern is not seen with the expression ofthe parental gene PAX3 or FOXO1 (Fig. S4), further arguingagainst the possibility that the fusion is produced due to over-whelming expression of its parental genes. The four other chimericRNAs previously seen at time point T3 of MSC differentiationand RH30 were also seen at days 2 and 6 of hES-MSC differen-tiation (Fig. 4A). We then examined the expression of all other

Fig. 2. More than 100 novel chimeric RNAs throughout myogenesis wererevealed by RNA-seq. (A) Myogenic differentiation was induced in MSCs bydifferentiation medium (DM). RNA was extracted from samples collected atfour time points. RT-PCR detected PAX3–FOXO1 and GAPDH. (B) Circos plotdepicting chimeric RNAs discovered across the genome. Lines connect pa-rental genes. Half lines are due to the close proximity between the twoparental genes. Outer ring, chromosomes; innermost ring, lines denote thechimeric RNAs connecting two parental genes; middle two rings, heat mapand histogram depicting gene expressions. Arrows point to the five chimerascommon to T3 and RH30 samples. (C) RT-PCR and Sanger sequencing vali-dation. Representative examples of fusion RNAs amplified by RT-PCR andfractionated by gel electrophoresis are shown. Bands were excised from thegel and processed for Sanger sequencing. The asterisks (*) indicate thecorrect band when multiple bands were seen. Vertical dashed red lines inSanger sequencing results mark the fusion junction site.

Fig. 3. Similar chimeric fusion RNA profiles between T3 myogenesis sampleand ARMS cell line RH30. (A) Fusion RNA profiling analysis revealed the closestsimilarity between T3 and RH30. This analysis used binary format, with 1 in-dicating that the fusion was present (red) and 0 indicating that the fusion wasabsent (black). (B) Whole-transcriptome profiling failed to uncover the con-nection between T3 and RH30. (C) Venn diagram showing the overlap between1,075 up-regulated genes in T3 and RH30 compared with T1, and 1,072 PAX3–FOXO1 downstream target genes derived from published ChIP-seq experiments(10). (D) Gene Set Enrichment Analysis (26) confirmed the enrichment of PAX3–FOXO1 targets in the commonly up-regulated genes in T3 and RH30.

13128 | www.pnas.org/cgi/doi/10.1073/pnas.1612734113 Xie et al.

Dow

nloa

ded

by g

uest

on

June

26,

202

0

Page 4: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

validated fusion RNAs found in RH30 cells by RNA-seq. Re-markably, all of these fusions were detected at the day-2 timepoint (18/18) and day-6 time point, except for METTL21B–TSFM(17 of 18). Interestingly, seven fusion RNAs, including PAX3–FOXO1, were almost exclusively detected at days 2 and 6 (Fig. 4 A

and B). Clustering analysis with the quantitative RT-PCR (qRT-PCR) expression data for the 18 chimeric RNAs revealed theclosest degree of similarity of RH30 with days 2 and 6, the twotime points when PAX3–FOXO1 was detected (Fig. 4C). Of note,the disappearance of seven fusion RNAs at day 4 and reappear-ance at day 6 suggests a corresponding wavering pattern of theseother fusion RNAs. On one hand, this wavering pattern reflectsadditional unknown factors involved in the differentiation process;on the other hand, it provides an even stronger argument for theclose connection of PAX3–FOXO1-expressing myogenic cellswith RH30.In a separate validation experiment with a commercial mes-

enchymal stem cell, PAX3–FOXO1 was mostly detected at day 6and at lower levels detected at day 2 of differentiation (Fig. S5).We again performed qRT-PCR analyses on 18 validated fusionRNAs identified by RNA-seq in RH30 cells. Sixteen of thesefusions were also found at the day-6 time point (Fig. S5).

The Chimeras Are Not Downstream Targets of PAX3–FOXO1. The co-occurrence of multiple chimeric RNAs expressed by both RH30and certain time point(s) during normal myogenesis is highlyunlikely to be caused by stochastic chance. Because of the im-portance of PAX3–FOXO1 in ARMS, we wanted to examine thepossibility that the expression of other fusions is triggered byPAX3–FOXO1. We generated MSCs stably expressing PAX3–FOXO1 (MSC/P3F). The level of PAX3–FOXO1 is similar tothat in RH30 cells at both RNA and protein levels (Fig. S6). Wethen measured expression of these 18 fusion RNAs in MSC/P3Fcells. Only one fusion, LINC00545–MEDAG, was apparentlyinduced by PAX3–FOXO1 in this experiment (Fig. 4D). Thisfinding was also confirmed in 293T cells and a human foreskinfibroblast cell line that were transiently transfected with aPAX3–FOXO1-expressing plasmid. Therefore, we conclude thatall of the other fusions detected at the same time point(s) ofmyogenesis as PAX3–FOXO1 were not simply due to the tran-scriptional activity of PAX3–FOXO1.

Chimeric RNAs in Sarcoma Cell Lines and Clinical Samples. We thenexamined the fusion RNAs in other rhabdomyosarcoma (RMS) celllines, including PAX3–FOXO1-positive and -negative lines. Most ofthe fusion RNAs could be detected in at least one additional RMScell line (RH30 and RMS-13 were derived from the same patient;ref. 27) (Fig. 5A). We also tested these fusions in eight cell linesbelonging to six other sarcoma cell types: 402-91 (myxoid liposarcoma),JN-DSRCT (desmoplastic small round cell sarcoma), SUCCS-1(clear cell sarcoma), OsACL (osteosarcoma), A2243 (synovialsarcoma), and A673, TTC-466, and TC32 (Ewing sarcoma). Nineof the 18 fusions were only seen in RMS cells. Interestingly, theseven fusions that matched PAX3–FOXO1 expression pattern inFig. 4A were very specific to RMS cells (WARS2–TBX15 hadweak signal in other sarcoma lines) (Fig. 5 A and B).To rule out the possibility of cell culture artifact, we measured

the fusions in a panel of clinical RMS tumor cases consisting offive PAX3–FOXO1 (P3F)-positive cases, three PAX7–FOXO1(P7F)-positive cases and six fusion-negative RMS cases (Fig. S7).As controls, we included fetal and adolescent muscle biopsies, aswell as RH30 cells. To test for RMS specificity, we also includedclinical samples of other “small round blue cell” tumors. Con-sistent with the cell line data (Fig. 5 A and B), several fusionRNAs, including MEGF11–TIPIN, SYNE3–LINC00341, andTHSD4–SERHL2 were found to be specific to RMS. They weredetected in P3F, P7F, and PF-negative RMS cases, but not insynovial sarcoma, liposarcoma, desmoplastic small round cellsarcoma, osteosarcoma, or Ewing sarcoma samples (Fig. 5C).The presence of the same set of fusion RNAs in both normal

myogenesis and RMS could be attributed to the possibility that theycontribute directly to tumorigenesis (i.e., beneficial to the tumorcells, as in the case of PAX3–FOXO1). It is also possible that they aresimply bystanders and are not detrimental to the tumor cells. Notsurprisingly, many fusions were also found in the fetal musclesample, but most were undetected or only weakly expressed in

Fig. 4. Eighteen chimeric RNAs shared by RH30 and PAX3–FOXO1-expressingmyogenic cells. (A) Myogenic differentiation was induced in hES-MSCs. Sam-ples were collected every other day. All validated chimeric RNAs found in RH30cells by RNA-seq were tested by RT-PCR in myogenesis samples. All were de-tected at day 2. All except one were detected at day 6. (B) qRT-PCR analyses ofsix chimeric RNAs that were almost exclusively detected at the same timepoints as PAX3–FOXO1. Expression levels of these chimeric transcripts werenormalized toGAPDH and then to the level of RH30. (C) Hierarchical clusteringof the expression data for these 18 fusion RNAs at various time points. Days 2and 6 clustered closest to RH30. (D) Stable MSCs expressing PAX3–FOXO1(MSC-P3F) were established. RT-PCR results for representative fusions in MSC-P3F, parental MSC, and RH30 are shown here. Only one of the assayed chimericRNAs was induced by PAX3–FOXO1.

Xie et al. PNAS | November 15, 2016 | vol. 113 | no. 46 | 13129

GEN

ETICS

Dow

nloa

ded

by g

uest

on

June

26,

202

0

Page 5: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

the adolescent muscle sample. This finding is consistent with thetransient nature of these fusion RNAs observed in the MSC dif-ferentiation system. Several chimeric RNAs, including MEGF11–TIPIN, SYNE3–LINC00341, THSD4–SERHL2, and ROBO1–XPC,displayed higher expression in RH30, as well as in certain indi-vidual RMS cases, compared with the fetal muscle sample. Thisdifferential expression could be because of the greater degree ofhomogeneity, in terms of cell types expressing the fusions, in thetumor samples compared with the fetal sample. Conversely, itis also possible that the fusions are beneficial to tumorigenesis,leading to their selective preservation. We are carrying out func-tional studies on several of such fusions. In contrast, several chi-meric RNA, including LINC00545–MEDAG, had similar levels ofexpression in the fetal samples and the clinical samples (Fig. S8),suggesting that they are most likely bystanders.

DiscussionARMS is an aggressive soft-tissue cancer that most commonlyoccurs in children and adolescents. This cancer often occurs inlimbs and expresses markers of immature muscle (MYOD andMYOG). Even with collaborative national trials of multimodaltherapy, the outcome for patients with advanced-stage ARMShas not improved over the last few decades. The cell of origin forthis tumor is currently not clear. The lack of an answer to thisbasic question has impeded the understanding of the tumoretiology. This issue may also be clinically relevant, because arecent study demonstrated that the lineage of origin significantlyinfluences sensitivity to target therapeutics (20).Over the years, various experimental systems have been used

to address the cell-of-origin question, but they have yieldedconfusing and conflicting results (19, 20). Some evidence sug-gests that MSCs may be the cell of origin for ARMS (19); someevidence supports myogenic satellite cells (28); and others pro-pose terminally differentiated muscle cells or fetal myoblasts (20,29). The disagreement partially stems from the traditional“candidate cell” approach: In both cell culture and animalmodel systems, certain types of cells were selected and testedto determine whether they could be transformed upon theintroduction of “oncogenic” factors, such as PAX3–FOXO1.However, cells that are accessible and can be transformed arenot necessarily the cells from which a particular tumor arises.The best support for this argument is that throughout thehistory of cancer research, the mouse fibroblast cell line NIH3T3 has been widely used to test different oncogenic factorsfound in different tumor types. We now propose a differentstrategy by monitoring the cancer-signature chimeric RNAsassociated with the tumor throughout the normal muscle celldifferentiation process. We propose that the pattern of fusionRNA expression can be used as a fingerprint to identify can-didates for cell of origin. Here, chimeric RNA profiling is usedto provide hints for the etiology of a disease.The fact that PAX3–FOXO1 is expressed both in certain cells

during normal myogenesis and in ARMS supports the connec-tion between the two. However, an alternative hypothesis is thatinappropriate PAX3–FOXO1 expression in a lineage that does notnormally express this fusion may be oncogenic. We found that anadditional 17 chimeric RNAs were shared between the two cells andthat seven fusions had identical transient expression pattern asPAX3–FOXO1, strongly arguing against this ectopic expressiontheory, because it is highly unlikely that an unrelated cell populationwould acquire all these fusions associated with normal myogenesis.Another important issue is the finding that, when expressed ec-topically in many cell types, the PAX3–FOXO1 protein, as well asother oncogenic fusion products, triggers growth arrest and/or celldeath (30, 31). The fact that endogenous PAX3–FOXO1 and otherfusion RNAs are found in some normal myogenic cells supports thepossibility that ARMS may originate from these cells, because theyare permissive for, if not responsive to, these fusions.The fusion transcriptome clustering with a binary input

(presence or absence of a chimera) performed as well as tradi-tional transcriptome clustering in human body maps (Fig. 1). It

uncovered a connection between normal myogenesis and ARMS(Fig. 3A) that could not be revealed directly by using whole-transcriptome analyses (Fig. 3B). However, the connection be-tween the fusion transcript and gene fusion at the chromosomallevel is not clear. In addition, more studies are needed to un-derstand the regulation mechanism of the fusion transcripts,including their transient nature, and their specificity.

Materials and MethodsCell Lines and Culture Conditions. MSCs were obtained from the TulaneUniversity Center for Gene Therapy. hES-MSCs were obtained fromMillipore.

RNA-Seq and Bioinformatics Analyses. MSC muscle differentiation and RH30cell RNA-seq were performed by Axeq. The deep-sequencing data were

Fig. 5. Chimeric RNAs in sarcoma cell lines and clinical RMS samples. (A) Atotal of 18 chimeric RNAs were examined by RT-PCR in four additionalrhabdomyosarcoma cell lines and eight non-RMS sarcoma cell lines. A673,TTC-466, and TC32 are Ewing sarcomas; 402-91, myxoid liposarcomas; JN-DSRCT, desmoplastic small round cell sarcoma; SUCCS-1, clear cell sarcoma;OsACL, osteosarcoma; and A2243, synovial sarcoma. (B) qRT-PCR analysisshows 10 chimeras that are specific to RMS cell lines. Of note, all seven chi-meras that have the same wavering expression pattern as PAX3–FOXO1 dur-ing MSC myogenesis are specific to RMS. (C) A total of 18 chimeric RNAs weremeasured by qRT-PCR in 14 clinical rhabdomyosarcoma samples, 2 noncancermuscle biopsies, and RH30. In addition to the RMS cases, two synovial sarcoma(SS), two liposarcoma (LS), two desmoplastic small round cell sarcoma (DS),two osteosarcoma (OS), and one Ewing sarcoma (EWS) clinical samples wereused as controls. Three RMS-specific examples are shown here.

13130 | www.pnas.org/cgi/doi/10.1073/pnas.1612734113 Xie et al.

Dow

nloa

ded

by g

uest

on

June

26,

202

0

Page 6: Fusion transcriptome profiling provides insights into ... · Fusion transcriptome profiling provides insights into alveolar rhabdomyosarcoma Zhongqiu Xiea, Mihaela Babiceanua,1, Shailesh

mapped to human genome version hg19 and analyzed by using the softwareSOAPfuse (32). The human tissue RNA-seq was analyzed by EricScript (24).The identified chimeric RNAs were presented by using Circos as described(33). For the heat-map generation and clustering, MultiExperimentViewersoftware was used.

RT-PCR and Sanger Sequencing. Fusion candidates were validated at the RNAlevel by standard RT-PCR or qRT-PCR with primers listed in Table S2. After RT-PCR and gel electrophoresis, all purified bands were sent for Sanger sequencing.

Ethics Approval and Consent to Participate. All RMS tumor samples were dei-dentified. Expression studies of these tumor samples was designated as exemptby the National Institutes of Health Office of Human Subjects Research.

ACKNOWLEDGMENTS. We thank Drs. Carola Ponzetto and Riccardo Taulli forproviding pBABE-PAX3-FOXO1 construct; Dr. Bishwanath Chatterjee for technicalassistance; Drs. Dutta, Liang, Triche, Biegel,Mandahl, Ladanyi, Epstein, Shapiro, andTronick for providing us the tumor cell lines; and Loryn Facemire and Fuming Yi fortheir technical support. This work is supported by St. Baldrick’s V Scholar grant.F.G.B. is supported by the intramural program of the National Cancer Institute.

1. Rabbitts TH (1994) Chromosomal translocations in human cancer. Nature 372(6502):

143–149.2. Heim S, Mitelman F (2008) Molecular screening for new fusion genes in cancer. Nat

Genet 40(6):685–686.3. Rowley JD (1973) Letter: A new consistent chromosomal abnormality in chronic my-

elogenous leukaemia identified by quinacrine fluorescence and Giemsa staining.

Nature 243(5405):290–293.4. Druker BJ, et al. (2001) Efficacy and safety of a specific inhibitor of the BCR-ABL ty-

rosine kinase in chronic myeloid leukemia. N Engl J Med 344(14):1031–1037.5. Tomlins SA, et al. (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor

genes in prostate cancer. Science 310(5748):644–648.6. Shaw AT, et al. (2011) Effect of crizotinib on overall survival in patients with advanced

non-small-cell lung cancer harbouring ALK gene rearrangement: A retrospective

analysis. Lancet Oncol 12(11):1004–1012.7. Soda M, et al. (2007) Identification of the transforming EML4-ALK fusion gene in non-

small-cell lung cancer. Nature 448(7153):561–566.8. Liu Y, Hernandez AM, Shibata D, Cortopassi GA (1994) BCL2 translocation frequency

rises with age in humans. Proc Natl Acad Sci USA 91(19):8910–8914.9. Biernaux C, Loos M, Sels A, Huez G, Stryckmans P (1995) Detection of major bcr-abl

gene expression at a very low level in blood cells of some healthy individuals. Blood

86(8):3118–3122.10. Uckun FM, et al. (1998) Clinical significance of MLL-AF4 fusion transcript expression in

the absence of a cytogenetically detectable t(4;11)(q21;q23) chromosomal trans-

location. Blood 92(3):810–821.11. Qin F, et al. (2015) Discovery of CTCF-sensitive Cis-spliced fusion RNAs between ad-

jacent genes in human prostate cells. PLoS Genet 11(2):e1005001.12. Magrangeas F, et al. (1998) Cotranscription and intergenic splicing of human galac-

tose-1-phosphate uridylyltransferase and interleukin-11 receptor alpha-chain genes

generate a fusion mRNA in normal cells. Implication for the production of multido-

main proteins during evolution. J Biol Chem 273(26):16005–16010.13. Frenkel-Morgenstern M, et al. (2012) Chimeras taking shape: Potential functions of

proteins encoded by chimeric RNA transcripts. Genome Res 22(7):1231–1242.14. Carrara M, et al. (2013) State of art fusion-finder algorithms are suitable to detect

transcription-induced chimeras in normal tissues? BMC Bioinformatics 14(Suppl 7):S2.15. Babiceanu M, et al. (2016) Recurrent chimeric fusion RNAs in non-cancer tissues and

cells. Nucleic Acids Res 44(6):2859–2872.16. Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts. Nature 461(7261):

206–211.17. Li H, Wang J, Mor G, Sklar J (2008) A neoplastic gene fusion mimics trans-splicing of

RNAs in normal human cells. Science 321(5894):1357–1361.18. Li H, Wang J, Ma X, Sklar J (2009) Gene fusions and RNA trans-splicing in normal and

neoplastic human cells. Cell Cycle 8(2):218–222.19. Charytonowicz E, Cordon-Cardo C, Matushansky I, Ziman M (2009) Alveolar rhabdo-

myosarcoma: Is the cell of origin a mesenchymal stem cell? Cancer Lett 279(2):126–136.20. Abraham J, et al. (2014) Lineage of origin in rhabdomyosarcoma informs pharma-

cological response. Genes Dev 28(14):1578–1591.21. Yuan H, et al. (2013) A chimeric RNA characteristic of rhabdomyosarcoma in normal

myogenesis process. Cancer Discov 3(12):1394–1403.

22. Shapiro DN, Sublett JE, Li B, Downing JR, Naeve CW (1993) Fusion of PAX3 to amember of the forkhead family of transcription factors in human alveolar rhabdo-myosarcoma. Cancer Res 53(21):5108–5112.

23. Klein G, Klein E (1986) Conditioned tumorigenicity of activated oncogenes. CancerRes 46(7):3211–3224.

24. Benelli M, et al. (2012) Discovering chimeric transcripts in paired-end RNA-seq data byusing EricScript. Bioinformatics 28(24):3232–3239.

25. Cao L, et al. (2010) Genome-wide identification of PAX3-FKHR binding sites inrhabdomyosarcoma reveals candidate target genes important for development andcancer. Cancer Res 70(16):6497–6508.

26. Subramanian A, et al. (2005) Gene set enrichment analysis: A knowledge-based ap-proach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA102(43):15545–15550.

27. Hinson AR, et al. (2013) Human rhabdomyosarcoma cell lines for rhabdomyosarcomaresearch: Utility and pitfalls. Front Oncol 3:183.

28. Calhabeu F, Hayashi S, Morgan JE, Relaix F, Zammit PS (2013) Alveolar rhabdomyosar-coma-associated proteins PAX3/FOXO1A and PAX7/FOXO1A suppress the transcriptionalactivity of MyoD-target genes in muscle stem cells. Oncogene 32(5):651–662.

29. Galindo RL, Allport JA, Olson EN (2006) A Drosophila model of the rhabdomyosar-coma initiator PAX7-FKHR. Proc Natl Acad Sci USA 103(36):13439–13444.

30. Lessnick SL, Dacwag CS, Golub TR (2002) The Ewing’s sarcoma oncoprotein EWS/FLIinduces a p53-dependent growth arrest in primary human fibroblasts. Cancer Cell1(4):393–401.

31. Xia SJ, Barr FG (2004) Analysis of the transforming and growth suppressive activitiesof the PAX3-FKHR oncoprotein. Oncogene 23(41):6864–6871.

32. Jia W, et al. (2013) SOAPfuse: An algorithm for identifying fusion transcripts frompaired-end RNA-Seq data. Genome Biol 14(2):R12.

33. Krzywinski M, et al. (2009) Circos: An information aesthetic for comparative geno-mics. Genome Res 19(9):1639–1645.

34. Grigoriadis AE, Heersche JN, Aubin JE (1988) Differentiation of muscle, fat, cartilage,and bone from progenitor cells present in a bone-derived clonal cell population:Effect of dexamethasone. J Cell Biol 106(6):2139–2151.

35. Ju YS, et al. (2011) Extensive genomic and transcriptional diversity identified throughmassively parallel DNA and RNA sequencing of eighteen Korean individuals. NatGenet 43(8):745–752.

36. Fagerberg L, et al. (2014) Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol CellProteomics 13(2):397–406.

37. Sigova AA, et al. (2013) Divergent transcription of long noncoding RNA/mRNA genepairs in embryonic stem cells. Proc Natl Acad Sci USA 110(8):2876–2881.

38. Chen A, et al. (2014) E-cadherin loss alters cytoskeletal organization and adhesion innon-malignant breast cells but is insufficient to induce an epithelial-mesenchymaltransition. BMC Cancer 14:552.

39. Asmann YW, et al. (2011) A novel bioinformatics pipeline for identification andcharacterization of fusion transcripts in breast cancer and normal cell lines. NucleicAcids Res 39(15):e100.

40. Kang BH, Jensen KJ, Hatch JA, Janes KA (2013) Simultaneous profiling of 194 distinctreceptor transcripts in human cells. Sci Signal 6(287):rs13.

41. Saeed AI, et al. (2003) TM4: A free, open-source system for microarray data man-agement and analysis. Biotechniques 34(2):374–378.

Xie et al. PNAS | November 15, 2016 | vol. 113 | no. 46 | 13131

GEN

ETICS

Dow

nloa

ded

by g

uest

on

June

26,

202

0