Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain...
Transcript of Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain...
![Page 1: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/1.jpg)
1
Transcriptomeanalysesoftumor-adjacentsomatictissues1
revealgenesco-expressedwithtransposableelements2
NickyChung1,*,G.M.Jonaid1,*,SophiaQuinton1,*,AustinRoss1,*,CorinneE.Sexton1,3
AdrianAlberto2,CodyClymer2,DaphnieChurchill2,OmarNavarroLeija2andMiraV.4
Han1,3,§5
1SchoolofLifeSciences,UniversityofNevada,LasVegas,NV89154,USA6
2DepartmentofComputerScience,UniversityofNevada,LasVegas,NV89154,USA7
3NevadaInstituteofPersonalizedMedicine,LasVegas,NV89154,USA8
9
*Theseauthorscontributedequallytothiswork10
§Correspondingauthor11
Emailaddresses:12
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 2: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/2.jpg)
2
Abstract1
Background2
Despitethelong-heldassumptionthattransposonsarenormallyonlyexpressedinthegerm-line,recent3evidenceshowsthattranscriptsoftransposableelement(TE)sequencesarefrequentlyfoundinthe4somaticcells.However,theextentofvariationinTEtranscriptlevelsacrossdifferenttissuesanddifferent5individualsareunknown,andtheco-expressionbetweenTEsandhostgenemRNAshavenotbeen6examined.7
Results8
HerewereportthevariationinTEderivedtranscriptlevelsacrosstissuesandbetweenindividuals9observedinthenon-tumoroustissuescollectedforTheCancerGenomeAtlas.WefoundcoreTEco-10expressionmodulesconsistingmainlyoftransposons,showingcorrelatedexpressionacrossbroad11classesofTEs.Despitethisco-expressionwithintissues,thereareindividualTElocithatexhibittissue-12specificexpressionpatterns,whencomparedacrosstissues.ThecoreTEmoduleswerenegatively13correlatedwithothergenemodulesthatconsistedofimmuneresponsegenesininterferonsignaling.14KRABZincFingerProteins(KZFPs)wereover-representedgenemembersoftheTEmodules,showing15positivecorrelationacrossmultipletissues.ButwedidnotfindoverlapbetweenTE-KZFPpairsthatare16co-expressedandTE-KZFPpairsthatareboundinpublishedChIP-seqstudies.17
Conclusions18
WefindunexpectedvariationinTEderivedtranscripts,withinandacrossnon-tumoroustissues.We19describeabroadviewoftheRNAstatefornon-tumoroustissuesexhibitinghigherlevelofTEtranscripts.20TissueswithhigherlevelofTEtranscriptshaveabroadrangeofTEsco-expressed,withhighexpression21ofalargenumberofKZFPs,andlowerRNAlevelsofimmunegenes.22
Keywords23
Transposon,TE,L1HS,RNA-seq,co-expression,mitochondria,KRABzincfinger24
Background25
Althoughtransposableelements(TEs)havebeenstudiedforalongtime,their26
ubiquitousandhighlytissue-specificexpressionpatternsarestartingtobeappreciated27
onlyrecently.ThefactthatTEscomposecloseto40%ofthehumangenomeis28
frequentlyemphasized,butthefactthatthereisobservableamountofTEderived29
transcriptsinhumanRNA-seqdatahasmostlybeenignoredorregardedasanuisance30
withoutanyfunctionalrelevance[1].LINEelementshavelongbeenthoughttobe31
expressedonlyinthegermlinecells[2–4].But,bothfull-lengthandpartialtranscriptsof32
LINEsarefrequentlyfoundinthesomaticcells[4–6]withlargevariationinexpression33
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 3: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/3.jpg)
3
levelsacrosstissuetypes,andamongdifferentindividuals[7].ThelevelofTE1
expressionisespeciallypronouncedincancercells[8–11],andcelllines[12],butare2
alsoobservedinneurogenesis[13]andnormalsomatictissue.Faulkneretal.in2009,3
wasthefirststudytoprovideaglobalpictureofthesignificantcontributionof4
retrotransposonstohumantranscriptomeinmultipletissuetypes[14].Thisreport5
showedthat6-30%oftranscriptshadtranscriptionstartsiteslocatedwithin6
transposons,andthesetransposonswereexpressedinatissue-specificmannerand7
influencedthetranscriptionofnearbygenes.TheresultswereextendedbyDjebalietal.8
in2012showingagainthetissue-specificityoftransposonexpression,andthatmostof9
thesetranscriptsarefoundinthenuclearpartofthecell[15].Inadditiontothetissue-10
specificexpressionofTEs,importantregulatoryrolesforTEsareemerging(reviewedin11
[16]).Observationsincludecontributiontotranscriptionstartsites[14],sourceof12
transcriptionfactorbindingsites[17],sourceoflongnon-codingRNAs[18],active13
transcriptionduringearlydevelopment[19],andevencriticalfunctionsimilartolong14
non-codingRNAsthatguidechromatin-remodelingcomplexestospecificlociinthe15
genome[20].16
AlthoughtherearemanyreportsofTEexpressioninthesomaticcells,thereisstilla17
largegapinourunderstandingofhowTEexpressionisrepressedandde-repressedin18
humansomaticcells.Basedonwhatwehavelearnedsofar,TEexpressionisregulated19
throughmultiplelayers,consistingoftranscriptionfactors,epigeneticmodification,20
PIWI-interactingRNAs(piRNAs),RNAinterference(RNAi),andposttranscriptionalhost21
factors.Recently,twodifferentapproachesofgenome-widescreeninghaveidentified22
proteinsthatregulatedifferentaspectsoftheactivitiesofLINEelements.CRISPR–Cas923
screenwasusedtoidentifyproteinsthatrestrictLINEactivity[21].TheproteinMORC224
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 4: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/4.jpg)
4
andthehumansilencinghub(HUSH)complexwasshowntoselectivelybind1
evolutionarilyyoung,full-lengthLINEslocatedwithineuchromaticenvironments,and2
promotedepositionofhistoneH3Lys9trimethylation(H3K9me3)fortranscriptional3
silencing[21].Andthroughproteomicsapproaches,twostudieshaverecentlyidentified4
thelocalizationofORF1andORF2proteinsanditsinteractingpartners[22],andthe5
timingoftheentranceoftheORF2proteincomplexintothenucleus[23].But,nostudy6
hasyetexaminedthecorrelationintranscriptlevelsofhostmRNAandtransposonRNA.7
Recently,high-throughputRNA-seqdataofvarioustypesofcancersamplesandtheir8
normalcounterpartshavebecomeavailableinTheCancerGenomeAtlas(TCGA)[24–9
26].Byfocusingonthenon-tumoroustissuesamplesfromTCGA,wecanaccess10
thousandsofnaturalexperimentsacrossvarioustypesoftissuesthatshowvariationin11
TEtranscriptlevels,andobtainaglobalpictureofTEexpressionandregulationin12
humans.AnimportantstrengthoftheTCGAdatasetisthelargenumberofsamples13
collectedforeachtissuetypeandthehighdepthoftheRNA-seqexperiment,witha14
medianofabout150Mreadspersample,whichisseveraltimeslargerthanausual15
RNA-seqlibrary.ThevariationinTEtranscriptlevelsobservedinmultiplesamples16
withineachtissue,allowedustoanalyzetheco-expressionpatternsbetweenhostgenes17
andTEsforthefirsttime.Wehypothesizedthatgenesthatregulatethetranscription18
levelofTEswouldshowcorrelationinexpressionlevelswiththeTEtranscripts.Since19
thesamplesarecollectedfromfresh-frozentissues,TEtranscriptlevelsareobservedin20
vivo,complementingthestudiesthatfocusonretrotranspositionassaysortransposon21
expressioninhumancelllines.22
WefirstsummarizethesurveyofTEexpressionvariationfoundintheRNA-seqdata23
from697samplesofcancer-adjacentnon-tumoroustissue.Weconfirmtheearlier24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 5: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/5.jpg)
5
findingsthatTEexpressionvariesacrosstissuetypes.TranscriptlevelsofindividualTE1
lociarehighlytissuespecificandwithineachfamilyonlyafewindividuallociarehighly2
expressed,contributingtothebulkofthetransposontranscriptsatthefamilylevel.We3
alsofindlargevariationintotalTEtranscriptlevelacrossindividualsampleswithin4
eachtissuetype.5
Although,transposonshavestrongtissue-specificpatternsatthelocuslevel,wealso6
foundthatthemajorityofTEsshowglobalco-expressionatthefamilylevelacross7
samples.Byanalyzingtheco-expressionbetweentheseTEsandindividualgenes,we8
foundco-expressionmodulesofTEsandgenesreplicatedacrosstissues.9
Results10
TEderivedtranscriptsarequantifiedacross16tissuesand697samplesof11tumoradjacentcontrols.12
Were-alignedandquantifiedTEderivedtranscriptsfromtheRNAsequencingdataof13
697samplesacross16tissuescollectedasnon-tumorouscontrolsfortheTCGAproject14
(supplementaryTable1).Thelibrarysizesforthesesamplesrangefrom50Mreadsat15
theminimum,toupto390Mreads,withamedianatabout149Mreads(75Mpairs).16
AlthoughalltissuesincludedinthisstudyweresequencedusingtheHiSeq200017
platform,esophagusandstomachsamplesweresequencedseparatelyatBritish18
ColumbiaGenomeSciencesCentre(BCGSC),withhighersequencingdepthonaverage19
(median227Mreads).Theproportionofreadsthatdonotmaptoannotatedgeneswere20
differentbetweenthelatersamplessequencedatUniversityofNorthCarolinaatChapel21
Hill(UNC),andtheearlierBCGSCsequencedsamples,withBCGSCsampleshavingmore22
reads(median177M)notmappingtoannotatedgenesanddiscarded,whileUNC23
sampleshadlessreadsdiscarded(median97M),possiblyduetothedifferenceinpoly-A24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 6: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/6.jpg)
6
enrichmentprotocol(MultiMACSmRNAisolationkitvs.TruSeqRNALibraryPrepKit1
[27])amongmanyotherdifferences,includingreadlength.Becauseofthesedifferences,2
whencomparingacrosstissuesamples,wehadtoconsideresophagusandstomach3
tissuesseparately,andtheycouldnotbecomparedagainsttherestofthetissues.4
Despitethedifferencesintheoverallsequencingdepthandoverallproportionofreads5
mappingtogenes,wefoundthattheDESeq2normalizationmethodnormalizesthe6
readseffectivelyandthecorrelationduetolibrarysizedisappearsafternormalization7
withintissues(SupplementaryFigure1).Ourco-expressionanalysisisdonewithineach8
tissueseparately.Wealsoreplicateourresultsfoundinesophagusandstomachwith9
similarresultsfoundinatleastoneothertissue.10
AlthoughwefindreadsmappingtoTEsinallthesamplesthatwehaveexamined,the11
overalltranscriptscomingfromTEsarestillarelativelytinyproportionofthetotal12
library.ThetotalnumberofreadsmappingtoTEsrangedfrom137Kto2.1Mwitha13
medianof615KforUNCsamples,andrangedfrom282Kto3.3Mwithamedianof835K14
forBCGSCsamples.Thiscountexcludessomeofthepotentialread-thrutranscriptsas15
describedinthenextsection.ThetotalreadcountsacrossallTEsamounttoabout1.2%16
(UNC)or1.7%(BCGSC)ofthetotalreadsmappingtoknowngenesinnon-tumorous17
tissuesamples.18
TEreadsoriginatingfrompre-mRNAsorretainedintronsarecorrectedby19comparingthereaddepthsoftheflankingintrons.20
Therehavebeenpreviousreportsoftransposonreadscomingfrompre-mRNAor21
retainedintronsinthematureRNAofgenesthatcontainTEsequencesintheirintrons22
[28].Theextentofthisproblemcanbepartiallyestimatedbycomparingtheread23
depthsofthetransposontothereaddepthsoftheflankingintrons.Ifthereadsmapping24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 7: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/7.jpg)
7
toTEsarepartofthepre-mRNAorretainedintrons,weshouldseecontinuousmapping1
ofreadsthatspantheintronsflankingtheTEofinterest,andobservereadsthatmap2
acrosstheintron-TEboundaries.Wecanalsopartiallycorrectforthisproblemby3
utilizingthereaddepthsintheflankingintronstoproportionallyreducethenumberof4
totalreadsmappedtoTEs.Theapproachisdescribedbelow.5
𝑅𝐼𝐿 =𝑐𝑜𝑢𝑛𝑡𝐼𝐿
𝑙𝑒𝑛𝐼𝐿 − 𝑟𝑒𝑎𝑑_𝑙𝑒𝑛6
𝑐𝑜𝑢𝑛𝑡123 = 𝑐𝑜𝑢𝑛𝑡12 − 𝑐𝑜𝑢𝑛𝑡12×𝑅67 + 𝑅692𝑅12
, 𝑖𝑓𝑅67 + 𝑅692𝑅12
< 1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒7
𝑇𝐸:focalTE8𝐼𝐿:intronlefttoTE.𝐼𝑅:intronrighttoTE.9𝑅67:readdepthoftheintronlefttotheTE10𝑐𝑜𝑢𝑛𝑡67:readcountsmappedtotheintronlefttotheTE(includesmulti-mapped11reads)12𝑙𝑒𝑛67:lengthoftheleftintron13𝑟𝑒𝑎𝑑_𝑙𝑒𝑛:lengthofthesequencingread14𝑐𝑜𝑢𝑛𝑡12′:countofreadsmappedtoTEafterthecorrection.15
WemodifiedthesoftwareTEtranscripts[29],followingthisapproach,todiscountthe16
TEreadcountsbasedonthereaddepthsofthesurroundingintrons.Bylookingfor17
largedifferencesaftercorrectingbyflankingreaddepth,weidentifiedTEsthataremost18
frequentlytranscribedaspartoftheintrons(Table1).Wealsofoundcaseswherethe19
methodcorrectedforerroneousTEquantificationsduetoTEsembeddedwithinlong20
non-codingRNAs(lncRNAs).Forexample,anAluSx1elementonchromosomeYat21
position21153222(AluSx1_dup59209)hadveryhightranscriptlevelswithanaverage22
readcountof18863inthyroidandheadandnecktissue,buttheAluelementis23
embeddedinalncRNAgenecalledTTTY14.Thereadsmappingherearecountedas24
AluSx1transcriptsbasedontheUCSCTEannotation,butinthealignment,weseethat25
therearereadsspanningtheboundariesofAluSx1_dup59209,andalmostallthereads26
mappedintheregionareuniquelymappedreads.ItlookstobeacaseofanAlu27
domestication,whereanAluinsertionorasecondaryduplicationofanoriginalAlu28
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 8: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/8.jpg)
8
insertionbecamepartofatestisspecificRNAgene[30].ThreeexamplesofAluSx1,L2a,1
andL1MA7,wherethereadcountsforthetransposonsarereducedtozeroare2
visualizedinSupplementaryFigure2.AluSx1_dup59209(chrY:21153222-21153521)is3
embeddedwithinanexonofgeneTTY14.L2a_dup21781(chr2:113980079-113981081)4
isembeddedinanintronofPAX8.L1MA7_dup4297(chr8:134015602-134015763)is5
embeddedinanintronofgeneTG.Inallthreecases,readcountsforthefocalTEswere6
reducedtozeroafterthecorrectiondescribedabove,andthereadsmappingtothese7
TEsdidnotcontributetotheoverallTEfamilycount.Ifoneisinterestedintransposable8
elementtranscriptlevelthatisnotpartofalongerRNAmolecule,itisimportanttotake9
intoaccountthereaddepthsoftheflankingintronsorexons,especiallythelatestnon-10
codingRNAgeneannotations,whenquantifyingrepeatelementtranscriptsinthe11
genome.12
Relyingonuniquelymappedreadsforrepeatquantificationresultsin13quantificationbiasedformappableelements.14
Duetothedifficultyofmappingreadstorepeatelements,oneoftheapproachestaken15
istocountonlythereadsthatmaptoauniquepositioninthegenome.Butthis16
approachhasrepeatedlybeenshowntoproduceresultsthatareworsethan17
expectation-maximization[31,32],andcanleadtoseriousbiases.Ifweonlycount18
uniquelymappedreadsinouranalysis,notonlydidwethrowawayfrom10.7%toupto19
45%(median14.2%)ofthetotalTEtranscripts,wethrewawaydatainabiased20
manner,suchthatweendedup“quantifyingmappability”insteadof“quantifying21
transcripts”.Thisproblemisespeciallypronouncedwhenquantifyingtheyoungand22
activeL1HSelement.Toassesstheeffectofalignmentonquantification,wetriedtwo23
differentalignments,onebasedontheSTARalignerwithupto200multi-mapped24
positions,andtheotherbasedonBowtie1withonlyasinglebestalignmentposition,25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 9: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/9.jpg)
9
discardingallreadsthatdonothaveauniquebestmapping.Figure1showstwocases1
thatillustratethelimitationsofeitheroftheapproaches.Figure1a.showsanexample2
ofafulllengthL1HSlocusonchromosomeX:75453754-75459553withhigh3
mappability(48basemappabilityshownatthetopofthepanel)duetomany4
accumulatedmutationsinitssequence.Thetoppanelshowsthebowtie1alignment,5
allowingonlyuniquelymappingreadswithasinglebesthit,andthebottompanel6
showstheSTARalignmentwithmulti-mappingupto200mappingsperread.Inthe7
STARalignment,wecanseeerroneouslysplitreadalignmentsatthe3’endthatresult8
inreadsmappingacrossgreaterthan10Kdistances,thatshowsalimitationofasplicing9
orientedalignmentsoftware.Thetranscriptionforthiselementdoesnotstartatthe5’10
endofthefulllength,butthereisclearandunambiguoustranscriptionstartingfrom11
about1500basesin,thatarecongruentbetweenbothalignments.InFigure1b.it12
showsanotherfulllengthL1HSlocusonchromosomeX:11953208-11959433,this13
timeayoungelementwithverylowmappability.Comparingthetopandbottompanels,14
wecanseethatwiththeuniquemappingweareignoringallthereadsthatareperfectly15
mappingtothislocus,butalsomaptomultipleotherlocations.Thereisahugepile-up16
atthe5’endofthefulllength.Ifwelookatthereadsmappingtothe5’endofthislocus,17
theirNHtagsshownumbersrangingfrom2to4,meaningthattheyaremappingtotwo18
tofouralternativelocationsinthegenome.ConsideringthatL1HSlocicontainingthe5’19
endsaremorelikelytobefulllengthelements,thesereadsaremorelikelytobecoming20
fromoneofthefewfulllengthL1HSlociinthegenome,but,weendupignoringthese21
readsifweareonlycountinguniquelymappingreads.Ontheotherhand,withmulti-22
mapping,weendupquantifyingwithlargeuncertaintyonwhetherthereadspiledupin23
thisregionarereallytranscribedfromthisparticularlocus.Thisisevidentbythesmall24
regionsofextremelyhighpile-upsthatreflectfragmentsthatarefoundinthegenome25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 10: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/10.jpg)
10
withhighfrequency.Although,weshouldpointoutthatwedon’tcountallthereads1
aligninghereatfacevalue,sincetheexpectationmaximizationalgorithmwilldown2
weighthecountsofreadsbythenumberofplacesitmapsto.3
Comparingthemappabilityofthetwoexamples,wecanseethattheuniquelymapping4
approachpreferentiallycountsreadscomingfromolderTElociwithhigher5
mappability.Thiscanalsobeshownbycorrelatingthelocuslevelreadcountsforeach6
L1HSelementagainstthelengthofuniquelymappablepositionsineachTElocus7
(Figure2a.).Fortheuniquemappingapproach,weseethereissignificantcorrelation8
betweenthelocuslevelquantificationandthetotallengthofuniquelymappable9
positionswithinthatlocus(p-value=5.523e-07forsumofreadcountsacrossall10
samples,andp-value=1.072e-13formaximumreadcountamongallsamples).The11
multimappingapproachwithExpectationMaximizationdoesnotshowthatbiasfor12
uniquelymappableregions(p-value=0.60forsumandp-value=0.08formax)(Figure213
b.).14
Consequently,thereislimitedcorrelationinthelocuslevelquantificationofL1HS15
betweentheuniquelymappedreadsandthemulti-mappedreads(Figure2c.).This16
showsthedifficultyofquantifyingyoungactiveelements,suchasL1HSusinggenome-17
wideRNA-seqdata.Duetotheselimitations,analysisonL1HSinthisstudyhasbeen18
doneatthefamilylevel.ThefamilylevelquantificationofL1HSstillshowsvariability19
basedonthereadmappingapproach(SupplementaryFigure3e.),butitshowsstronger20
correlationthanthelocuslevelquantification.Westillwantedtoutilizetheabundance21
ofRNA-seqdataavailableforstudyingL1HStranscription,andgleaninformationon22
L1HSfromthesedata.Basedontheobservationthat3’endsofL1HSarefrequently23
representedinfragmentedL1HSloci,whilethe5’endsaremorefrequentinfulllength24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 11: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/11.jpg)
11
L1HSloci,wedecidedtousethereadcountsofthe5’endoftheelementasameasure1
thatbetterrepresentsthetranscriptleveloffull-lengthL1HStranscriptsinthesample.2
AllthefollowinganalysisonL1HSexpressionarebasedonthereadsmappedtoL1HS3
sequencesinthegenomethatalignwiththefirst300basesofthe5’endoftheL1HS4
consensussequenceandallowingformulti-mapping.5
Ontheotherhand,wefoundthatquantificationofolderelementsshowedverystrong6
correlationbetweenthetwoapproaches,uniquemappingandmulti-mapping,reflecting7
thehighermappabilityofolderelementsinthegenome(SupplementaryFigure3).For8
thelocuslevelco-expressionanalysiswiththeZincFingerProteins,welimitedour9
analysistoolderelementsthatare100%uniquelymappableacrossitssequenceswitha10
48-basereadlength.Allofourmainresultsarequalitativelyreplicatedinthedatawith11
uniquelymappedreadsalignedwithbowtie,exceptfortheresultsregardingthe5’end12
ofL1HSexpression.13
TEexpressionshowstissue-specificexpressionpatternsatthelocuslevel14amongsomatictissues15
TherehavebeenmultiplereportsoftissuespecificexpressionofTEsinthehuman16
genome,startingfromFaulkneretal.in2009[14]toPhilippeetal.2017[12]more17
recently.WealsofoundhighlydistincttissuespecificityinTEtranscriptsintheTCGA18
data,suchthatthatwecouldclustereachsampleintotheirbroadertissuegroupings,19
basedonlocuslevelTEexpressionpatternsalonewithoutrelyingonanygenesatall.20
Figure3showstheclusteringoftissuesforfamilylevelandlocuslevelquantificationof21
LTRs,DNAtransposons,SINEsandLINEs.Weusednormalizedmutualinformation22
betweenthedifferentclusteringresultsandthegroundtruth(thetruetissuegroup)to23
evaluatethequalityofclustering.Normalizedmutualinformationwascomparedfor24
clusteringresultsbasedongeneexpression,familylevelTEexpression,locuslevelTE25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 12: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/12.jpg)
12
expressionandrandomassignments.Wefoundthatthelocus-levelTEexpressionwas1
aspredictiveoftissuegroupingsasthegenes(Figure3.i,SupplementaryTable2).LTRs,2
DNAtransposons,LINEsandSINEsgavesimilarclusteringaccuracyasthegenes.The3
TEfamilyexpressionlevelsdidnothaveenoughinformationtoclusterthetissues4
correctly.Wenotehere,thatthesearelociselectedbytherankofvarianceinlog25
normalizedreadcountacrossallsamplesregardlessoftissuetype,andwehaven’tdone6
anydifferentialexpressionanalysistoidentifythemarkersthatarethemost7
informativeforaccuratetissueclassification.Thus,theclassificationperformancewe8
observehereisnottheoptimalperformancethatwecouldgetifweweretodecideon9
themarkersbasedonatrainedclassifier.WhenweexcludedTEsthatarewithin1K,10
10K,and100Kofthestartandendsofthegenes,theaccuracydeclined,sopartofthe11
tissuespecificityisduetoco-locationwithtissuespecificgenes.But,evenwhenrelying12
onTEs100Kawayfromanyknowngenes,wesawthattissuespecificinformationwas13
largelyretained.Ontheotherhand,whenwefocusedonyoungerelements,HERVs14
withinLTRsandyoungL1swithinLINEs,therewasalargereductionininformation15
content,especiallyforyoungL1s.ClusteringbasedonlocuslevelexpressionforL1HS,16
L1PA2andL1PA3wasnotanybetterthanclusteringbasedonfamilylevelexpression17
ofallLINEs.Wesuspectthisisduetothelowerlocuslevelmappabilityandlarge18
uncertaintyinlocuslevelexpressionquantificationforyoungL1elements.Figure419
showsfifteenrepresentativeTElocithatshowtissuespecificexpression.Theselociare20
chosenfromTEsthatare100Kawayfromthestartorendofanygene.21
ThegranularlevelsoflocusspecificTEexpressioncontainedtissue-specific22
information,but,theoveralltranscriptlevelofTEclassesdidnotshowsignificant23
variationacrosstissues(Figure5a.).ThehigherexpressionlevelsforTEsseenfor24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 13: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/13.jpg)
13
esophagusandstomachisconfoundedwiththedifferencesinsequencingprotocol1
describedabove,sotheyarenotdirectlycomparabletotherestofthetissues.When2
focusingon300basesinthe5’endofL1HS,itshowedsomevariationacrosstissues3
withhigherlevelsintheheadandnecktissuesandlowerlevelsintheliver,consistent4
withthepreviousobservationinadulthumantissues[6]andinhumancelllines[12],5
albeitwithlargewithintissuevariance(Figure5b.).Although,wecannotdirectly6
compareL1HSexpressioninesophagusandstomachtotherestoftheothertissues,we7
cantellthatthereiscleartranscriptionofthe5’endofL1HSinesophagusandstomach.8
Figure5b.showsthenormalizedreadcountsinlog2scale,withamedianofmorethan9
500readsmappingto300basesatthe5’endofL1HSforesophagusandstomach10
(medianlibrarysize227Mreads).TherehavebeenobservationsoffulllengthL1HS11
expressedintheadultesophagusandstomachtissue,atabout80%and150%relative12
tothelevelsinHeLacells[6],andactiveL1retrotranspositioninpremalignant13
precursorlegionsofesophagealadenocarcinoma[33]14
Co-expressionanalysisofintergenicTEsidentifiescoreTEmodulesand15correlatedZincFingerProteins.16
Co-expressionnetworkanalysisisanappropriateapproachtoexaminetheco-17
expressionacrossdifferentTEfamiliesandhostgenestogether.Inordertoidentifythe18
commongene/TEmodulesthatarecorrelatedacrossdifferenttissues,wedida19
consensusnetworkanalysisacrosstissuesusingtheweightedgeneco-expression20
networkanalysisintheWGCNApackage[34].FortheTEfamilytranscriptsinthis21
analysis,weonlyincludedintergenicTEs,i.e.weonlycountedreadsmappingtoTEs22
thatare1Kbawayfromanystartandendofknowngenes.Weidentified61modules23
across11groupoftissues,combiningcertaintissuetypestogetherasabroadergroup24
(colonandrectum,esophagusandstomach,kidneys,lungs).Amongthe20531genes25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 14: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/14.jpg)
14
and992TEfamiliesthatwerequantifiedinthe697samples,18670genesand923TE1
familieshadenoughexpressionlevelandvariationtobeincludedinthenetwork2
analysis.Amongthose19593genesandTEs,9599genesand658TEswereclustered3
intoamoduleofco-expression,while9336didnotbelongtoanydefinedmodule.The4
listofmodules,correlationsbetweenthemodules,andtopologicaladjacencymatrix5
thatdefinesthemodulesarevisualizedforthebreasttissueinSupplementaryFigure4.6
Visualizationforothertissuesweresimilar,aswelookedforconsensusmodulesacross7
alltissues.TherewereonlyafewmodulesthatcontainedTEtranscripts:onlyseven8
modulescontainedmorethantenTEfamilieswithinthemodule.SupplementaryTable9
3showsthedistributionofTEfamiliesinthesesevenmodules.Weconsideredmodules10
M8,M21,M38andM45ascoreTEmodules,astheirmembershipmainlyconsistedof11
TEfamiliesasthemajority(markedby*inSupplementaryFigure4).12
ThecorrelationbetweencloselyrelatedTEsubfamiliesisexpectedbecausereadsfrom13
transposonsthatmaptosequencesthatareindistinguishablebetweensubfamiliesare14
assignedtomultiplesubfamilieswithproportionalweightbyTEtranscriptsusingan15
Expectation-Maximizationalgorithm.CloselyrelatedfamiliessuchasL1HSandL1PAs16
alsosharecommonregulatoryelementsatthe5'end.But,wefindthatthecorrelated17
TEfamiliesinaTEmodulespandifferentclassesofTEs,andarereplicatedevenwhen18
countinguniquelymappedreadsonly.Consideringthereisnosequencesimilarity19
betweentheSINEs,DNAtransposons,LTRsandtheLINEs,thecorrelationamongthese20
diverseclassoftransposonsisprobablyduetoacommonregulation,ordys-regulation,21
thatisde-repressingthesetransposonsatthesametime.Therehavebeenreportsof22
suchco-expressionofERVsandLINEsincanceroustissues[35,36],possiblythrough23
concordanthypomethylation[37].24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 15: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/15.jpg)
15
TherewasoneclassofhostgenesthatwerefrequentlyfoundasmembersoftheTEco-1
expressionmodules,andtheyweretheKRABZincFingerProteins(KZFPs).Table22
showsthelistofKZFPsthatwereidentifiedasTEmodulemembers.3
ExpressionofimmunegenesarenegativelycorrelatedwithintergenicTE4expression.5
Onceweidentifiedmodulesconsistingmostlyoftransposonfamilies,wealsoexamined6
whetheranyco-expressionmoduleswerenegativelycorrelatedwithTEmodules.We7
foundtwomodules,M33andM35,thatshowedconsistentnegativecorrelationacross8
tissues.Thegenesincludedinthesemodulesweregenesinvolvedininnateimmune9
system,interferonsignaling,immunoproteasome,etc(Figure6).Figure6showsthe10
enrichedannotationtermsdetectedforbothmodulesthroughtheReactomedatabase,11
thetop30geneswithhighestmodulemembershipforthetwomodules,andthe12
correlationplotbetweenTEsintheTEmodulesM8,M21,M38andM45,andgenesin13
moduleM33andM35inthetissuesbreast(Figure6d)andesophagus/stomach(Figure14
6e).OnlygenesandTEsthatshowgreaterthan0.6Pearsoncorrelationwiththe15
representativeprofileofthemoduleinalltissueshavebeenincludedinthecorrelation16
plot.Weobservehighcorrelationwithingroupsandcontrastingnegativecorrelation17
betweengroups.18
Co-expressionanalysisincludingintronicTEsrevealsnegativecorrelation19betweenintronicTEexpressionandmitochondrialgeneexpression.20
WhenweincludeintronicTEtranscriptsintheoverallTEexpressionlevels,theco-21
expressionanalysisledtoadifferentpicturefromtheanalysisofintergenicTEs.When22
intronicTEsareincluded,asinglemodule,N1,emergesasthedominantTEmodule,23
containing612outof848TEfamilies(72%)thatwasassignedamodulemembership.24
Infact,N1consistsof72%ofallTEfamiliesbutonly2%ofallgenes.25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 16: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/16.jpg)
16
Table3showsgenesthataresignificantlycorrelatedwithmoduleN1inmultiple1
tissues.Apatternimmediatelynoticeableisthattherearemanypseudogenes,intronic2
transcripts,antisenseRNAs,andlongnon-codingRNAsonthelist.Itlookslikewith3
intronicTEs,wearedetectingacellstatethatisdysregulatedinsplicingormRNA4
qualitycontrol,andasaresult,weareseeingaglobalelevationofpervasive5
transcriptionthatisgenerallynon-functional.Multipleproteincodinggenesonthelist6
areinvolvedinmRNAsplicingregulation,suchasNCRNA00201,anisoformofHNRNPU7
whichshowedstrongcorrelationwiththeintronicTEmoduleinsevendifferenttissues,8
aswellasCCNL2,LUC7andLUC7L3,perhapsasaresponsetothedysregulatedsplicing.9
AnotherinterestinggeneinthelistisNKTR,hintingatthepresenceofimmunecellsin10
thetissuesampleswithhighintronicTEexpression.Thisisincontrastwiththenegative11
correlationweobservewithimmunegenesandintergenicTEexpression.12
ThemodulethatwasnegativelycorrelatedwiththeintronicTEmodule(N1)included13
co-expressionclustersconsistingofmitochondrialproteinsandribosomalproteins14
(N4).N4wastheonlymodulethatwasconsistentlynegativelycorrelatedwithN1with15
lessthan-0.7correlationcoefficientacrossalltissues.Figure7showsthecorrelation16
plotsbetweenTEsinN1,andgenesinthemitochondrialgenemoduleN4,forbreastand17
esophagus.EnrichedannotationtermsforthegenesfoundintheReactomedatabase18
arecenteredaroundtranslationandmitochondria.Oneintriguingpossibilitymaybe19
thatthefailedsplicingandmRNAcontrolisleadingtoasuppressionoftranslationthat20
inturnleadstoreducedRNAlevelsofmitochondrialgenesandribosome.21
WeagainsawanenrichmentofKZFPsasmembersoftheintronicTEmoduleN1,and22
anothermoduleN10,thatwaspositivelycorrelatedwithN1(SupplementaryTable4).23
ThelistofKZFPshadsomeoverlapwiththeKZFPsco-expressedwithintergenicTE24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 17: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/17.jpg)
17
modules,butthereweresomedifferencesaswell.Wecombinedthe22KZFPsin1
moduleN1andN10andexaminedwhethertherewereanycommontranscription2
factorbindingforthesegenesfoundintheENCODEChIP-seqdatawiththeEnrichR3
database[38].TheregionneartheseKZFPswereenrichedwithbindingofGABPA,a4
regulatorofnuclearencodedmitochondrialgenes,inmultiplecelllines(Supplementary5
Figure5).Thiswasinteresting,giventhenegativecorrelationobservedwithintronic6
transposonsandnuclearencodedmitochondrialgeneexpressiondescribedabove.7
Genesco-expressedwithL1HSincludegenesregulatingmajorsignaling8pathways,chromatin,andstressresponse.9
GiventheinterestintheactiveelementL1HS,andtheuncertaintyinL1HS10
quantification,wedecidedtolimitthequantificationtothe5’regionofL1HS,and11
examinethehostgenesthatarespecificallycorrelatedtotheexpressionof5’regionof12
L1HSwithoutregardtotheco-expressionmodules.Inordertocontrolforthe13
correlationwithotherTEs,especiallyintronicTEs,weincludedtherepresentative14
profileofN1asacovariateintoourlinearmodel.Oneconcernwithco-expression15
analysisispositionaloverlap.Therewere14genesthatoverlappedwiththeL1HSloci16
wewerecountingthereadsfrom.Only1ofthe14genes,RAB3GAP2,showedsignificant17
correlationwithL1HS5’,andwasremovedfromthefinallist.56geneswereidentified18
asnegativelycorrelated,and77geneswereidentifiedaspositivelycorrelatedwith19
L1HS5’inatleasttwotissues(Figure8,SupplementaryTable5).Notablegenesinclude20
RASA1,RASA2,RRAS,EGFRandMAPK1,intheRas-MAPKpathway,ECSIT,TAB3and21
TRAF6,regulatorsoftheNF-κBpathway,RNASEH2C,aknownL1HSrepressor[39],22
TET2,knowntobindtoanddemethylateyoungL1s[40],THAP7,ahistonetailbinding23
transcriptionrepressor[41],andDDI2,aproteasethatcleavesandactivates24
NFE2L1/NRF1[42].Multiplegenesintherespiratoryelectrontransportpathway,25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 18: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/18.jpg)
18
ECSIT,NDUFA1,NDUFA8,NDUFB10,NDUFB8,SURF1,UQCR11,UQCRB,werenegatively1
correlatedwithL1HS5’,evenaftercontrollingforthecovariationwithintronicTEs,N1.2
WholelistofgenesarereportedinSupplementaryTable5.3
Wecheckedwhetherthelistofournegativelycorrelatedgeneswereoverlappingwith4
thegenesidentifiedthroughCRISPR–Cas9screen[21].Ofthe56negativelycorrelated5
genes,threegenes,RNASEH2C,HAUS7,RNF166werealsoonthelistof253secondary6
screenhits.Therewasnooverlapamongthe77positivelycorrelatedgenes.7
WealsocheckedwhetherthereweretranscriptionfactorsknowntobindtoL1HS8
sequence[43]inourlist.Ofthe77positivelycorrelatedgenes,fourgenes,YY1,REST,9
ELF1,ZBTB33wereidentifiedtobindtoL1HS[43].Therewasnooverlapamongthe5610
negativelycorrelatedgenes.Tocheckifthesametranscriptionfactorsareregulating11
thecorrelatedgenesandL1HS,wealsocheckedwhatkindofTFbindingisobservedin12
theupstreamofourcorrelatedgenes.TherewereafewenrichmentofENCODE13
transcriptionfactorbindingupstreamofourlistofcorrelatedgenes(Supplementary14
Figure6),butexceptforYY1,theenrichedTFsdidnotoverlapwiththelistofSunetal.15
[43]16
TEmoduleexpressioniscorrelatedwithradiationexposureinthyroidtissue.17
WeexaminedwhetheranyoftheclinicalvariableswereassociatedwiththeTEmodule18
expressionortheL1HSexpressionlevels.Wetestedthevariablesage,daystodeath,19
pathologicalstage,Tstaging,Nstaging,Mstaging,gender,radiationandraceforeach20
tissuetype.NovariablewasfoundtobeassociatedwithL1HS5’expression.Radiation21
therapywastheonlyclinicalvariableassociatedwithmoduleN1(intronicTEmodule)22
expressioninthenon-tumoroustissueofthyroid(p-val=0.00894,Supplementary23
Figure7).24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 19: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/19.jpg)
19
Co-expressedTEsandKRAB-ZFPsshowlimitedoverlapwithChIP-seqbinding1
BasedonthepositivecorrelationobservedamongKZFPsandTEmodules,andexisting2
literatureontheroleofKZFPsforTErepression,wedecidedtoexaminethecorrelated3
expressionofallpairsof979TEfamiliesand366KZFPs.Themoststrikingpattern4
observedwasthatKZFPsandTEsshowoverwhelminglypositivecorrelationandlittle5
negativecorrelation.Chromosome19,wherethemajorityoftheKZFPsareclustered,is6
alsothechromosomewiththehighestdensityoftransposableelements.Thisunique7
structureofchromosome19mayleadtoTEsembeddedinKZFPgeneserroneously8
identifiedasco-expressed.Weavoidtheconfoundingeffectofpositionaloverlap9
betweenTEsandKZFPsbyonlycountingreadsmappingtoTEsthatareinthe10
intergenicregion1Kbawayfromanygenes.Theremayberesidualcorrelationdueto11
sharedgenomicenvironmentofalargerscale,suchasthechromatinstate.But,that12
doesn’texplainallthepositivecorrelation,because,whenwelookatthelocuslevel13
correlation,wefindthattheindividualTElocicorrelatedwiththeZNFsarescattered14
acrossallchromosomes,andnotnecessarilyenrichedonchromosome19.15
Theco-expressionbetweenKZFPsandTEswereobservedacrossalmostallTEfamilies,16
as794TEfamilieshadatleastoneco-expressedZFPinatleastonetissue.CertainZFPs,17
suchasZNF621,ZNF780B,ZNF84,ZNF33A,andZNF662,showedcorrelationwithawide18
rangeofTEfamiliesinmultipletissues.TE-KZFPpairs,HERVK14-int:ZNF814,MER57A-19
int:ZNF621,andMSTB-int:ZNF41werethemostfrequentpair-wiseco-expression20
observedbetweenTEfamiliesandKZFPs,foundpositivelycorrelatedinsixdifferent21
tissues.TheZFPsthatwerenegativelycorrelatedwithTEswereZNF511andZNF32,22
but,theyarenotclassifiedasKZFPsastheydonothaveaKRABdomain.23
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 20: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/20.jpg)
20
Welookedatthefamilylevelco-expressionbetweenTEfamiliesandKZFPsandtested1
theoverlapagainsttheKZFPboundTEfamilyenrichmentreportedintheChIP-exo2
study(GSE78099[44]).Wefoundthatthereisastatisticallysignificantassociation3
betweenco-expressionandbinding(p-value<2.2e-16).But,thenumberofoverlapping4
pairswereverysmall.Figure9showstheoverlapbetweenco-expressionandbinding5
enrichment.Weonlymarktheco-expressionfoundinatleasttwotissues,andwehave6
omittedtheTE-KZFPcombinationsthathaveneitherco-expressionnorbinding7
enrichmentfromthefigure.Thetotalcombinationstestedthatoverlapbetweenthetwo8
datasetsis200,889(221KZFPx909TEfamilies).4138pairwiseco-expressionwas9
observedinatleasttwotissues.Ofthose,only119wasenrichedforbindingintheChIP-10
exostudy[44].11
Tocheckhowtheco-expressionisobservedatindividualTEloci,wetooktheTEfamily-12
KZFPpairsthatshowcorrelatedexpression,andfurthertestedco-expressionbetween13
individualTElociofthecorrelatedTEfamilyagainsttheKZFPofinterest.With14
correlationsatthelocuslevel,wewereabletoexaminethelocuslevelco-expression15
andcompareitdirectlytothebindingpeaksreportedinImbeaultetal[44].Ofthe625816
co-expressedTElociwheretheKZFPhadbeenassayedwithChIP-exo,therewereonly17
4thatwereboundbythesameKZFP.Wedonothaveagoodexplanationforwhythere18
isalackofoverlapbetweenco-expressionandbindingatthelocuslevel,whenthere19
wasatleastsomeamountofoverlapatthefamilylevel.Itlooksliketheco-expression20
weobserveisaresultofindirectinteractions,andnotnecessarilydirectbinding.21
Wealsoobservedthatatthelocuslevel,therewasnotalotofoverlapbetweentheTEs22
thatareboundbyKZFPsin[44],andtheTEsthatareexpressedintheTCGAnon-23
tumouroustissues,regardlessoftheco-expressionrelationshipwithKZFPs.Here24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 21: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/21.jpg)
21
“expressed”meansthereisatleastonesampleinourdatawithmorethanfivereads1
mappedtotheTElocus,and“binding”meansthatthereisapeakdetectedinthe2
GSE78099ChIP-seqdataoverlappingwiththeTElocuswitha+-250bpbuffer.Figure103
showstheoverallbreakdownofthe4.5milliontransposonsannotatedinhg19UCSC4
Repeatmaskertrack.Statistically,thereismoreoverlapthanexpected(p-value<2.2e-5
16)betweenbindingandexpression,but,theoverallproportionofTEsthatareboth6
expressedinatleastonesampleandboundbyatleastoneZFPareatinyproportion7
(2.6%)ofallTEsinthegenome.8
Oneinterestingpatterndidemergewhenweexaminedtheoverlapwithepigenetic9
marksofcandidatecis-RegulatoryElementsdefinedintheENCODEdata[45].Wewere10
morelikelytoseeanenhancer-likemark(DNase+H3K27ac)forTElocithatare11
expressedcomparedtonon-expressedTEs,andweweremorelikelytoseeapromoter-12
likemark(DNase+H3K4me3)forZFPboundTElocicomparedtoTEswithnobinding13
(Figure10).The2.6%ofTElocithatareexpressedinatleastonesampleandboundby14
atleastoneZFPshowedthehighestproportionofbothpromoter-likemarksand15
enhancer-likemarks.WhenwedividetheTElociintogeneregions(genesincluding16
intronsand+-1Kflankingregion)andintergenicregions(1Kawayfromstartandend17
ofgenes),theoverallpatternremainedthesame,exceptthatTElociweretwiceaslikely18
tobeexpressediftheyareclosetogenescomparedtointergenicregions,andtheTE19
lociweretwiceaslikelytobeoverlappingwiththepromoter-likemarks20
(SupplementaryFigure8).Theenhancer-likemarksshowednodifferencebetween21
generegionsandintergenicregions,andtheCTCFmarksincreasedintheintergenic22
regions.23
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 22: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/22.jpg)
22
Discussion1
Limitationstothequantificationandcorrection2
Quantifyingtransposontranscriptsisadifficultproblem,duetotheirambiguityinshort3
readmappingbecauseofrepeatedcontentinthereferencegenome.Currentstateofthe4
artmethodsrelyonExpectation-Maximizationtoaccountfortheuncertaintyinmulti-5
mappedreads[29].Focusingonlyonuniquelymappedreadsdoesn’treallysolvethis6
problem,andwillleadtobiasedquantification,favoringolderelementswithhigher7
mappability.Scottetal.havedemonstratedthatbyrelyingonuniquemutationsfound8
withinindividualL1HSloci,andbyincludingsequencesofnon-referencepolymorphic9
L1HSloci,itispossibletoidentifythesourceoftheL1HSactivitywithsubstantial10
success[46].But,inourstudywedidnotattempttoidentifytheindividuallociofL1HS11
transcription,andinsteadfocusedonthetotalityofreadsmappingtoregionsof12
annotatedL1HSthataligntothe5’endofL1HSconsensussequence.13
AnothercomplicationinTEtranscriptquantificationisthatTEsarefrequently14
embeddedwithinintronsthataretranscribedbeforetheyareprocessed,orsometimes15
failtobesplicedout,orembeddedwithinexonsornon-codingRNAsthatareexpressed16
indifferentconditions[28].Toaccountforthissourceoferror,weintroducedamethod17
tocorrectforTEreadscomingfromretainedintronsorpre-mRNA.Althoughwe18
observedlargecorrectionsforspecifictransposableelementsembeddedwithinintrons,19
thecorrectionisnotcomplete.Wecantellthisfromtheobservationthattheco-20
expressionprofilesofintronicTEsaredifferentfromtheco-expressionprofilesof21
intergenicTEsawayfromthegenes.Thegenesco-expressedwithintronicTEsinclude22
pseudogenes,intronictranscripts,anti-sensetranscriptsandgeneswithfunctionsin23
splicing.Amoreaccurateapproachwouldbetocorrectforthereadcountsfrom24
retainedintronsbeforetheEMalgorithmbasedonthereaddepthofuniquelymapped25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 23: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/23.jpg)
23
reads,andthenrunEMbasedonthecorrectedcounts.But,estimatingthereaddepthof1
therepeatregionusinguniquelymappedreadsisadifficultproblem.Theeffective2
lengthoftheuniquelymappedregionisdifficulttoestimate,becauseagainmappability3
variesfromlocustolocusforanyTE,dependingontheuniquemutationsithas4
accumulated.So,forthisstudy,wedecidedtousetheeasierapproachtorunEMfirst,5
andprobabilisticallyassigntheTEreads,andthencorrectbasedontheexpectedread6
depthacrossthelengthoftheTElocus.Animportantfuturestudywouldbetostudythe7
mappabilityofindividualTElocicarefully,includingtheknownpolymorphicsites,and8
todesignasoftwareforTEquantificationthatcantakeintoaccountthemappabilityof9
eachlocusinitsEMalgorithm,aswellascorrectfortheretainedintronswhile10
consideringtheeffectivelengthoftheuniquelymappableregionwithintheTE.11
Despitetheselimits,themainresultsofco-expressionanalysiswerenotaffectedbythe12
quantification.Mostoftheresultsinthepaperwerereplicatedwhenquantificationwas13
doneonuniquelymappedreadsonly.Theonlyresultsthatchangedbetweenthemulti-14
mappedapproachvs.theuniquemappingapproachwerethegenescorrelatedwiththe15
L1HS5’expressionlevel.Forthose,wedecidedtoreportonresultsfromthemulti-16
mappedreadsratherthantheuniquereads,becauseofthebiasoftheuniquelymapped17
readswedescribedabove.18
Stress,immuneresponseandTEexpression19
Initially,whenwestartedtheproject,ourgoalwastoidentifycandidategenesinvolved20
intransposoncontrol,basedontheco-expressionanalysis.But,oncetheanalysiswas21
done,theresultswerepointingtowhatinducesTEexpression,ratherthanwhat22
suppressesTEexpression.Amongthegenesknowntofunctionintransposoncontrol,23
RNaseH2C(Figure8),HAUS7,andRNF166[21]showednegativecorrelationwithL1HS.24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 24: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/24.jpg)
24
Butseveralwell-knowngeneswithfunctionsintransposoncontrol,e.g.MORC2,SIRT6,1
KAP1,SAMHD1,MOV10,ZAP,C12orf35(humanorthologofRESF1),etc.aremissingin2
ourlistofsignificantlycorrelatedtranscripts.Instead,themajorthemethatemerged3
fromourresultsissignaltransduction,immuneresponse,andstressresponseasseen4
inthecorrelationbetweenL1HSandDDI2,Ras-MAPKandNF-κBpathway.Inhumans,5
variousstresseshavebeenshowntoinduceLINE1transcriptionoractivationincluding6
chemicalcompounds[47–49],radiation[50,51],oxidativestress[52]andaging[53].7
MostofthesestudieshaveobservedL1activityinvitro,byexposingculturedcellsto8
stressfactorsandassayingtheretrotranspositionactivity.9
ThenegativecorrelationwefindbetweenTEexpressionandimmunegeneactivityhas10
beenreportedbeforeingastrointestinalcancersamples.Jungetal.haveshownthatthe11
L1retrotranspositionrateisinverselycorrelatedwithexpressionofimmunologic12
responsegenes[54].Here,weextendthoseresultsandshowthatthenegative13
correlationbetweenTEexpressionandimmuneresponseisapatternfoundinnon-14
tumoroussamplesaswell,acrossdifferenttissuesanddifferentclassesofTEs.This15
relationshipisconfusing,sinceitisoppositeofthepositivecorrelationwefindbetween16
L1HSandNF-κBpathwaygenes(Figure8),andoppositeofthepatternobservedin17
severalcancerstudies,whereDNAhypomethylationandexpressionofendogeneous18
retrovirusactivatesinterferonsignaling[55–57].Immuneactiveenvironment19
surroundingthesetumoradjacentcellsplusnucleicacidsintheextra-cellular20
environmentcomingfromcancernearbymaybeputtingthetumoradjacentcellsinan21
antiviralstate.Itisknownthatinterferonsignalinginducesproteinsthatactagainst22
viruses.ZAPisoneexamplethatdegradesviralRNAaswellasRNAofLINEsandAlus23
[58],althoughZAPdoesnotshowcorrelatedexpressionwiththeTEmodulesinour24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 25: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/25.jpg)
25
data.Wehypothesizethatsuchcellstatesmayreducetransposontranscriptswith1
highersensitivitythroughRNAdegradationandchromatinremodeling.2
Thetissuesamplesinthisstudyarenotrepresentativeof“normal”cells,astheyare3
collectedascontrolsfromtissueadjacenttocancercells.Althoughtheyarenot4
undergoingthemolecularchangesassociatedwithmalignanttransformation,they5
couldbeundertheinfluenceofnearbyenvironment,withchangesinpHlevels,6
inflammation,andinfiltrationofimmunecells.TheinclusioncriteriaforTCGAdoesnot7
allowpatientswithanypriorsystemicchemotherapyoranyotherneoadjuvanttherapy,8
butitdoesallowlocalradiation,andweobservethatpastlocalradiationisassociated9
withhigherTEexpressionlevelsinadjacentcellsinthyroidtissues.Giventhe10
characteristicsofthesamples,thevariationinTEexpressionlevelsortheco-expression11
patternweobserveinthisstudymaybeduetocancer-associatedstress.Futurestudies12
willbeneededtoconfirmwhethertheresultsarereplicatedintruenormaltissue.13
TEsandKRAB-ZFPs14
ChIP-SeqstudiesonKRAB-ZFPshaveidentifiedextensivebindingbetweenthisfamilyof15
proteinsandtransposableelements[44,59],implyingaroleforsuppressingTE16
expression.KRABdomainisawell-knownrepressordomainandtogetherwiththeco-17
factorKAP1(TRIM28),theKZFP-KAP1complexhasbeenshowntosilenceboth18
exogenousretrovirusesandendogenousretroelementsduringembryonicdevelopment19
[60,61].Basedonthisobservation,andthepatternofco-evolutionofretroviralLTRs20
andtheC2H2-ZincFingergenefamily,ithasbeenhypothesizedthattheKRAB-ZFPs21
functionintransposableelementsuppression[62].ButexceptforafewKRAB-ZFPs,22
mostmembersdonothaveacharacterizedfunction.Inanalternativehypothesis,23
insteadofitsoriginalroleinsilencing,itwasproposedthatKRAB-ZFPsmayalsohavea24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 26: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/26.jpg)
26
roleincontrollingdomesticatedtransposableelementsthatcontributetothehost1
transcriptionregulationnetwork[63].Inourco-expressionanalysis,wefound2
overwhelmingpositivecorrelationbetweenKZFPsandTEsacrossallclassesofTEs.3
Thispositivecorrelationwasobservedwhetherwearecountingmulti-mappedreadsor4
uniquelymappedreads,andwhetherwearecountingTEsclosetogenes,orTEsinthe5
intergenicregions.Despitethisrobustpositivecorrelation,wefoundthattheco-6
expressedrelationshipshowedlimitedcorrespondencewithpublishedChIP-seq7
bindingresults.Therewasstatisticallymeaningfulbutverysmallnumberofoverlapat8
thefamilylevel,andalmostnooverlapatthelocuslevel.Theco-expressionweobserve9
seemstobelargelyanindirectrelationship,andnotaresultofdirectbinding.There10
havebeenobservationsofauniquechromatinstatethatissharedbetweenZFPclusters11
andrepeatclassesbytheRoadmapEpigenomicsproject[64].Thischromatinstate,12
termedZNF/Rpts,ischaracterizedbyH3K36me3marksco-occuringwithH3K9me313
marksandhighDNAmethylation.Itispossiblethatlocalchromatinenvironmentthatis14
co-regulatedatalargerscaleisresponsibleforthecorrelationattheRNAlevel.15
Conclusions16
TEderivedtranscriptsinthenon-tumouroustissuesshowlargevariationacross17
tissues,andacrossindividuals.Co-expressionnetworkanalysiswithintissuesrevealed18
generalco-expressionofTEsacrossallclasses.Italsofoundstrongco-expression19
betweenTEsandKRAB-ZincFingerProteinsthatarereplicatedinmultipletissues,but20
notcongruentwithdirectbindingofTE-ZFPrelationshipsassayedthroughChIP-seq.21
WealsofoundnegativecorrelationbetweenintronicTEsandmitochondrialgenes,and22
betweenintergenicTEsandimmuneresponsegenes,replicatedinmultipletissues.23
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 27: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/27.jpg)
27
Methods1
RNA-Seqandgeneexpressionquantificationinthenon-tumoroustissues.2
WeusedthegenelevelquantificationprovidedbyTheCancerGenomeAtlas(TCGA)for3
thegeneexpressions[24–26].Wecollectedgenelevelquantificationsfor697samples4
fromTCGA.Wefocusedoncancertypesthathadatleast10controlsamplesofRNA-seq5
data,collectedfromnon-tumoroustissueadjacenttothecancertissue.Asaresult,166
differenttissuetypeswereincludedinouranalysis:BLCA(Bladderurothelial7
carcinoma),BRCA(Breastcarcinoma),COAD(Colonadenocarcinoma),ESCA8
(Esophagealadenocarcinoma),HNSC(Headandnecksquamouscellcarcinoma),KICH9
(kidneychromophobe),KIRC(kidneyrenalclearcellcarcinoma),KIRP(Kidneyrenal10
papillarycellcarcinoma),LIHC(Liverhepatocellularcarcinoma),LUAD(Lung11
adenocarcinoma),LUSC(Lungsquamouscellcarcinoma),PRAD(Prostate12
adenocarcinoma),READ(Rectumadenocarcinoma),STAD(Stomachadenocarcinoma),13
THCA(Thyroidcarcinoma)andUCEC(UterineCorpusEndometrialCarcinoma).14
NumberofsamplesforeachtissueisdescribedinSupplementaryTable1.Althoughwe15
willusetheacronymforthecancertypetodescribethesetissues,weemphasizeagain16
thatalloursamplescomefromthenon-tumoroustissuescollectedfromthesameorgan17
ofthesamepatientwiththecancer.Thecancertissuesampleswerenotincludedinour18
analysis.19
MethodsforsequencinganddataprocessingofRNAusingtheRNA-seqprotocolforall20
tissuesexceptesophagusandstomachhavebeenpreviouslydescribedforTCGAin[24–21
26].Briefly,RNAwasextracted,preparedintopoly(A)enrichedIlluminaTruSeqmRNA22
libraries,sequencedbyIlluminaHiSeq2000(resultinginpaired48-ntreads),and23
subjectedtoqualitycontrol.Sequencingforesophagusandstomachwasdone24
differentlyfromothertissuesandhavebeendescribedin[27].Briefly,polyA+mRNA25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 28: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/28.jpg)
28
waspurifiedusingMultiMACSmRNAisolationkitonMultiMACS96separator,and1
doublestrandedcDNAwassynthesizedusingtheSuperscriptDouble-StrandedcDNA2
synthesiskit.Followingthelibrarypreparationprotocoldescribedin[27],thefinalDNA3
wassequencedonIlluminaHiSeq2000withpairedend75-ntreads.RNAreadswere4
alignedtothehg19genomeassemblyusingMapsplice[65].Geneexpressionwas5
quantifiedforthetranscriptmodelscorrespondingtotheTCGAGAF2.1usingRSEM6
[31].Weusedtheraw_countvaluesinthe.rsem.genes.resultsfiles,roundedtoan7
integer,asthegenelevelquantification.8
QuantifyingTEderivedtranscriptsatthelocusandfamilylevel9
WecollectedRNA-seqlevel1binaryalignmentfiles(.bamfiles)for697samples10
(SupplementaryTable1)fromTCGA.Thebamfileswerethenconvertedtofastqand11
realignedtothehg19referencegenomeusingSTARandBowtie1.WiththeSTAR12
alignment,weallowedupto200mappingsforeveryread(--outFilterMultimapNmax13
200--winAnchorMultimapNmax200).WiththeBowtie1alignment,weonlyallowed14
thesinglebestalignmentforeachread,andifthereweremultiplebestalignments,the15
readwasdiscardedfromthefinalalignment(-m1-S-y-v3-X1000--max).Weuseda16
modifiedversionofthesoftwareTEtranscripts[29]forquantifyingthereadsmapping17
toannotatedtransposons.TEtranscriptsisasoftwarethatcanquantifybothgeneand18
TEtranscriptlevelsfromRNAseqexperiments.Ittakesintoaccounttheambiguously19
mappedTE-associatedreadsbyproportionallyassigningreadcountstothe20
correspondingTEfamiliesusinganExpectation-Maximizationalgorithm.We21
implementedtwomodificationtotheoriginalTEtranscriptssoftware.1)Wemodifiedit22
toreportreadcountsforeachindividualTElocusinthereferencegenomeinadditionto23
thefamilylevelcounts.2)Wedevelopedafunctiontodiscountthereadcountsby24
removingreadcountsthatcorrespondtotranscriptscontainingTEsequencesthat25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 29: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/29.jpg)
29
originatefrompre-mRNAorretainedintronsinthematureRNA[28].Downstream1
analysesweredoneusingthediscountedquantificationbasedonmulti-mappedreads2
andtheuniquelymappedquantificationforboththeSTARalignmentandtheBowtie13
alignment,toassesstheimpactofuncertaintyinmulti-mappedreads.4
TheretrotransposonannotationsusedweregeneratedfromtheRepeatMaskertables,5
obtainedfromtheUCSCgenomedatabaseandprovidedbyTEtranscripts.For6
quantifyingreadsmappingtotheTEflankingintronswegeneratedgtffilescontaining7
1)theTEflankingintronpositions,2)theintergenicTEpositions,and3)theexonicTE8
positions(TEsthatfallwithinanexon,includingnon-codingRNAgenes).Incaseof9
intronicTEs,weusethealgorithmdescribedabovetodiscountthetranscriptsfrompre-10
mRNAorretainedintrons.IncaseofintergenicTEs,wecountallEMestimatedreads11
mappedtoTEswithoutanydiscount.IncaseofexonicTEs,weignorethosecounts12
altogether,andtheexonicTEsdonotcontributetothelocuscountnorthefamilylevel13
count.14
Normalizationandtransformationofreadcounts15
AfterquantifyingthereadsmappingtoannotatedgenesandTEs,boththegenelevel16
counts,andtheTEcountswerenormalizedbetweensamplesacrossalltissuetypes17
withDEseq2.Weusedthedefault"medianratiomethod"fornormalizationinDESeq218
[66].Briefly,thescalingfactorforeachsampleiscalculatedasmedianoftheratio,for19
eachgene,ofitsreadcounttoitsgeometricmeanacrossallsamples.Theassumptionof20
themedianratiomethodisthatmostgenesarenotconsistentlydifferentiallyexpressed21
betweentissues.Ifthereissystematicdifferenceinratiobetweensamples,themedian22
ratiowillcapturethesizerelationship.But,thisassumptionmaybeviolatedwhenwe23
arecomparinglargenumberoftissuestypesatthesametime,sincealargeproportion24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 30: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/30.jpg)
30
ofthegenesmaybedifferentiallyexpressedinatleastonetissuetype,oroneofthe1
tissuesmaybeextremelybiasedintheirnumberofdifferentiallyexpressedgenes.In2
ordertoachievemorerobustnormalization,weusedatwo-stepnormalizationmethod3
calledthedifferentiallyexpressedgeneseliminationstrategy(DEGES)[67].We4
performedpreliminarynormalizationusingthe“medianratiomethod”,filteredout5
potentialdifferentiallyexpressedgenesinthedata,foundasubsetofrobustnon-6
differentiallyexpressedgenes,andusedthesubsettoperformthesecondroundof7
“medianrationormalization”.TheresultingpairwiseMAplotbetweentissuesafter8
normalizationshowedbetternormalizationcomparedtotheregularone-step9
normalization.Thesizefactorsforeachsampleobtainedfromthetwo-step10
normalizationongenecountswerethenusedtonormalizetheTEquantificationsofthe11
samesample.Thenormalizedcountswerelog2transformedusingthevariance12
stabilizingtransformationfunctioninDESeq2[66,68]fordownstreamanalysis.13
Clusteringofsamplesbyexpressionpattern.14
Weclusterthesamplesusingthe“average”method(=UPGMA)inthehclustfunctionof15
R,andvisualizetheclusterswiththeComplexHeatmappackage[69].Thetop150genes16
orTEs,withthelargestvariancesonthelog2transformedreadcountswereusedfor17
clustering.Wedidnotselectgenesbyanymeasureofdifferentialexpressionacross18
tissues.Thesegenesweresimplythegenesshowingthelargestvarianceinreadcount19
acrossall697samples,regardlessoftissuetype.WeexcludegenesandTEsonXandY20
chromosomes.Basedonthelog2readcountofthetop150TEs,adissimilaritymatrixis21
calculatedandusedfortheclusteringandvisualization.Theaveragemethodofhclust22
computesallpairwisedissimilaritiesbetweenthemembersofthetwoclustersand23
considerstheaverageasthedistancebetweenthetwoclusters.Hierarchicalclustering24
startswitheachsampleassignedtoitsownclusterandthenproceedsiteratively,at25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 31: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/31.jpg)
31
eachstagejoiningthetwomostsimilarclusters,continuinguntilthereisjustasingle1
cluster.ForthelocuslevelTEexpression,wefilteredoutalllocithathadlessthan52
readcountsforeverysample.Tocomparetheclusteringofsamplesbasedongene3
expression,TEexpressionandrandomassignment,weusedthenormalizedMutual4
Information(NMI)measure[70].Thehierarchicalclusterswerecutoffatk=16,the5
numberofdifferenttissuetypes.Becausetheresultingclusterswerenotaccurate6
enoughtodistinguishbetweensimilartissues,weusedabroadertissuegroupingto7
comparewiththeclusters.Thetissuesweregroupedto10broadertypesbasedon8
preliminaryclustering:bladder/endometrium(BLCA,UCEC),breast(BRCA),liver9
(LIHC),colon/rectum(COAD,READ),esophagus/stomach(ESCA,STAD),headandneck10
—thesquamousepitheliuminthemucosalsurfacesinsidethemouth,nose,andthroat11
(HNSC),kidney(KICH,KIRC,KIRP),lung(LUAD,LUSC),prostate(PRAD),andthyroid12
(THCA).Thebroadertissuetypeofeachsamplewasusedasthegroundtruth.Each13
resultingclusterwasthenassignedagrouplabelbasedonthemajoritytissuetype.14
Normalizedmutualinformationwascalculatedbycomparingthelabelsfromthe15
clusteringtothetrueclasslabels.Randomassignmentclustersweregeneratedby16
permutingthetissuetypeswithandwithoutreplacement100times,andthemeanNMI17
wasreported.18
Co-expressionnetworkanalysiswithTEsandhostgenes19
WeightedcorrelationnetworkanalysiswasdonewiththeWGCNApackage[34].We20
startwiththesignedpair-wisecorrelationmatrixacrosstheexpressionlevels21
(normalizedlog2readcounts)ofallgenesandTEfamilies.Wecalculatetheadjacency22
matrixbyraisingthecorrelationmatrixtothepowerof14,powerparameterselected23
usingthescalefreetopologymeasure,effectivelysuppressingthelowcorrelationsdue24
tonoise.Topologicaloverlapbaseddistancematrix(TOM)iscalculatedusingthe25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 32: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/32.jpg)
32
networktopologyresultingfromtheadjacencymatrix.Thisprocedurewasrepeatedfor1
eachtissue,andaconsensusTOMwascalculatedacrossalltissues.Weusedhierarchical2
clusteringonthisconsensustopologicaloverlapmatrixtoidentifyclusters(modules)3
thataresharedacrosstissues.Arepresentativegeneexpressionprofileofthemoduleis4
definedbythefirstprincipalcomponentoftheexpressionlevelsofallmembersineach5
module.Therepresentativeprofileiscomparedbetweeneachmoduletoidentify6
positiveandnegativecorrelationbetweenmodules.7
CorrelatedexpressionbetweengenesandL1HS5’8
WeblastedalltheL1HSinstancesannotatedinrepeatmaskeragainsttheL1HS9
consensussequenceandidentifiedtheregionsaligningtothe300basesofthe5’endof10
theconsensussequence.WecountedallthereadsmappingtothelistofL1HS5’ends11
andnormalizedthemwiththesamesizefactordescribedabove.Weusedlog212
transformedvalueofthisnormalizedreadcountasthevariablerepresentingL1HS13
transcriptlevel.CorrelationbetweengeneandL1HS5’transcriptsweretestedineach14
tissuegroupsseparately,inbladder,breast,liver,colon/rectum,stomach/esophagus,15
headandneck,kidney,lung,prostateandthyroid.Wetested20532genesforeach16
tissuegroupusingalinearmodelwithlog2L1HS5’expressionasthedependent17
variable,andlog2geneexpressionastheindependentvariable.Foragenetobe18
includedinourtest,ithadtobepresentinatleasteightindividualpatients.Wealso19
requiredthatthegenebeexpressedwithaminimumRPMof2in75%ofthesamplesto20
beincludedinthedataset.Inadditiontotheradiationtherapyforthyroidtissue,we21
consideredeffectivelibrarysize(sumofallnormalizedcounts)andthebatchID22
providedbytheTCGAprojectasadditionalcovariates.Sincetherewassignificantco-23
expressionacrossallTEclassesespeciallyfortheintronicTEs,weincludedthe24
expressionprofileoftheintronicTEmoduleN1identifiedduringtheco-expression25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 33: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/33.jpg)
33
networkanalysisasacovariateinourlinearmodel.Thelinearmodelweusedis1
describedbelow.2
logJ 𝐿1𝐻𝑆~logJ 𝑔𝑒𝑛𝑒 + logJ 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒_𝑙𝑖𝑏𝑟𝑎𝑟𝑦_𝑠𝑖𝑧𝑒 + 𝑏𝑎𝑡𝑐ℎ + 𝑟𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛3+ logJ 𝑁1𝑝𝑟𝑜𝑓𝑖𝑙𝑒4
Wetestedallcombinationoflinearmodelsthatcanbecreatedbyincludingorexcluding5
thesevariables.Second-orderAkaikeInformationCriterion(AICc)wasusedtoselectthe6
bestlinearmodel.Weusedthecoefficientandp-valuefromthebestmodeltocalculate7
theq-values.Geneswithq-value<0.0001inatleasttwotissueswereidentifiedas8
correlatedgenes.9
CorrelationbetweenTEandKRAB-ZFPs10
TounderstandthepositivecorrelationbetweenTEsandKRAB-ZFPs,welookedatthe11
correlationbetweeneachKZFPsandTEsatthefamilylevelandattheindividualTE12
locuslevelindifferenttissuetypes.Wetestedthecorrelationfor366KRABZincFinger13
ProteinsthatwereidentifiedinImbeaultetal.[44]andalsofoundinourgene14
expressiondata.BecausethesearchspaceofpairwisecombinationsofKZFPand15
individualTElociwastoolarge,weexaminedtherelationshipinastep-wiseapproach.16
Inthefirststep,wetestedthecorrelationbetweenallpairwisecombinationsof36617
KZFPsand979TEsubfamiliesusingtheTEquantificationatthefamilylevelineach18
tissuetype.Then,inthesecondstep,oncethesignificantlycorrelatedKZFPandTE19
familywasidentified,wefocusedonthosepairs.Wetestedthecorrelationbetweenthe20
expressionofthesignificantKZFPandtheexpressionofeachindividuallocusofthe21
significantTEfamilyinthetissuewheretheinitialco-expressionwasfoundtoidentify22
individualTElocithatareco-expressedwiththeKZFP.23
Overlapbetweenco-expressionandbindingwasexaminedatthefamilylevelandatthe24
locuslevel.Atthefamilylevel,wedownloadedthefamilyenrichmentresultsfrom25
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 34: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/34.jpg)
34
Imbeaultetal.[44]andidentifiedpairsofTEfamiliesandKZFPthathadanenrichment1
scoregreaterthan1.Wecomparedthosefamiliesenrichedwithbindingofspecific2
KZFPstoourco-expressionresults,tocheckiftheTEfamilieswereco-expressedwith3
thesameKZFPs.Atthelocuslevel,wecomparedtheco-expressedTElociwiththe4
bindingpeaksreportedinthedatasetGSE78099.Wetook+-250bparoundthe5
boundaryofpeaksandfoundoverlapwithTEannotationsfromRepeatmasker.We6
checkediftheTElocusoverlappingwithChIP-seqpeakswerefoundtobeco-expressed7
withanyKZFPs.8
ListofAbbreviations9
TE:TransposableElement10
KZFP:KRABZincFingerProtein11
TCGA:TheCancerGenomeAtlas12
13
Declaration14
Ethicsapprovalandconsenttoparticipate15
Notapplicable.16
Consentforpublication17
Notapplicable18
Availabilityofdataandmaterial19
Thedatasetsgeneratedand/oranalysedduringthecurrentstudyareavailableinthe20
githubrepository,https://github.com/HanLabUNLV/TEcoex.Themodifiedversionofthe21
TEtranscriptssoftware[29]andtherequiredgtffilescanbefoundat22
https://github.com/HanLabUNLV/tetoolkit.23
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 35: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/35.jpg)
35
Competinginterests1
Theauthorsdeclarethattheyhavenocompetinginterests2
Funding3
ThisworkwassupportedbytheNationalInstitutesofHealth[R15GM116108,4
P20GM121325toM.V.H.],andbytheNationalScienceFoundation[1750532toM.V.H].5
Authors'contributions6
NCperformedtheco-expressionanalysis.GMJperformedTEtranscriptsquantification.7
SQperformedtheRECscoreanalysis.NC,ARanalyzedtheKZFP-TElocuspairwiseco-8
expressionandKZFPmotifsearch.CSre-ranthepipelinewiththeBowtieandSTAR9
alignmentprogram.AA,CC,DC,ONassistedwiththeanalyses.MVHdesignedthe10
experiments,modifiedtheTEtranscriptssoftware,analyzedandinterpretedthedata.11
MVHwrotethemanuscriptwiththehelpofallotherauthors.Allauthorsreadand12
approvedthefinalmanuscript.13
Acknowledgements14
WethanktheTCGAandGTExprojectteamsformakingthedataavailable.15
16
References171.SlotkinRK.ThecasefornotmaskingawayrepetitiveDNA.MobDNA.2018;9:15.182.BranciforteD,MartinSL.DevelopmentalandcelltypespecificityofLINE-1expression19inmousetestis:implicationsfortransposition.MolCellBiol.1994;14:2584–92.203.TreloganSA,MartinSL.Tightlyregulated,developmentallyspecificexpressionofthe21firstopenreadingframefromLINE-1duringmouseembryogenesis.ProcNatlAcadSci.221995;92:1520–4.234.ErgünS,BuschmannC,HeukeshovenJ,DammannK,SchniedersF,LaukeH,etal.Cell24Type-specificExpressionofLINE-1OpenReadingFrames1and2inFetalandAdult25HumanTissues.JBiolChem.2004;279:27753–63.265.KuboS,SelemeMdelC,SoiferHS,PerezJLG,MoranJV,KazazianHH,etal.L127retrotranspositioninnondividingandprimaryhumansomaticcells.ProcNatlAcadSci.282006;103:8036–41.29
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 36: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/36.jpg)
36
6.BelancioVP,Roy-EngelAM,PochampallyRR,DeiningerP.Somaticexpressionof1LINE-1elementsinhumantissues.NucleicAcidsRes.2010;38:3909–22.27.RangwalaSH,ZhangL,KazazianHH.ManyLINE1elementscontributetothe3transcriptomeofhumansomaticcells.GenomeBiol.2009;10:R100.48.SkowronskiJ,SingerMF.ExpressionofacytoplasmicLINE-1transcriptisregulatedin5ahumanteratocarcinomacellline.ProcNatlAcadSciUSA.1985;82:6050–4.69.BratthauerGL,CardiffRD,FanningTG.ExpressionofLINE-1retrotransposonsin7humanbreastcancer.Cancer.1994;73:2333–6.810.RodićN,SharmaR,SharmaR,ZampellaJ,DaiL,TaylorMS,etal.LongInterspersed9Element-1ProteinExpressionIsaHallmarkofManyHumanCancers.AmJPathol.102014;184:1280–6.1111.BratthauerGL,FanningTG.ActiveLINE-1retrotransposonsinhumantesticular12cancer.Oncogene.1992;7:507–10.1312.PhilippeC,Vargas-LandinDB,DoucetAJ,EssenDvan,Vera-OtarolaJ,KuciakM,etal.14ActivationofindividualL1retrotransposoninstancesisrestrictedtocell-type15dependentpermissiveloci.eLife.2016;5:e13926.1613.MuotriAR,ChuVT,MarchettoMCN,DengW,MoranJV,GageFH.Somaticmosaicism17inneuronalprecursorcellsmediatedbyL1retrotransposition.Nature.2005;435:903.1814.FaulknerGJ,KimuraY,DaubCO,WaniS,PlessyC,IrvineKM,etal.Theregulated19retrotransposontranscriptomeofmammaliancells.NatGenet.2009;41:563–71.2015.DjebaliS,DavisCA,MerkelA,DobinA,LassmannT,MortazaviA,etal.Landscapeof21transcriptioninhumancells.Nature.2012;489:101–8.2216.ChuongEB,EldeNC,FeschotteC.Regulatoryactivitiesoftransposableelements:23fromconflictstobenefits.NatRevGenet.2016;18:71.2417.SundaramV,ChengY,MaZ,LiD,XingX,EdgeP,etal.Widespreadcontributionof25transposableelementstotheinnovationofgeneregulatorynetworks.GenomeRes.262014;24:1963–76.2718.KapustaA,KronenbergZ,LynchVJ,ZhuoX,RamsayL,BourqueG,etal.28TransposableElementsAreMajorContributorstotheOrigin,Diversification,and29RegulationofVertebrateLongNoncodingRNAs.PLOSGenet.2013;9:e1003470.3019.JachowiczJW,BingX,PontabryJ,BoškovićA,RandoOJ,Torres-PadillaM-E.LINE-131activationafterfertilizationregulatesglobalchromatinaccessibilityintheearlymouse32embryo.NatGenet.2017;49:1502.3320.PerchardeM,LinC-J,YinY,GuanJ,PeixotoGA,Bulut-KarsliogluA,etal.ALINE1-34NucleolinPartnershipRegulatesEarlyDevelopmentandESCIdentity.Cell.352018;174:391-405.e19.3621.LiuN,LeeCH,SwigutT,GrowE,GuB,BassikMC,etal.Selectivesilencingof37euchromaticL1srevealedbygenome-widescreensforL1regulators.Nature.382017;553:228.3922.TaylorMS,AltukhovI,MolloyKR,MitaP,JiangH,AdneyEM,etal.Dissectionof40affinitycapturedLINE-1macromolecularcomplexes.eLife.2018;7:e30094.4123.MitaP,WudzinskaA,SunX,AndradeJ,NayakS,KahlerDJ,etal.LINE-1protein42localizationandfunctionaldynamicsduringthecellcycle.eLife.2018;7:e30058.4324.TheCancerGenomeAtlasNetwork.Comprehensivemolecularcharacterizationof44humancolonandrectalcancer.Nature.2012;487:330.4525.TheCancerGenomeAtlasNetwork.Comprehensivemolecularportraitsofhuman46breasttumours.Nature.2012;490:61.4726.TheCancerGenomeAtlasResearchNetwork.Comprehensivegenomic48characterizationofsquamouscelllungcancers.Nature.2012;489:519.49
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 37: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/37.jpg)
37
27.TheCancerGenomeAtlasResearchNetwork,BassAJ,ThorssonV,ShmulevichI,1ReynoldsSM,MillerM,etal.Comprehensivemolecularcharacterizationofgastric2adenocarcinoma.Nature.2014;513:202.328.DeiningerP,MoralesME,WhiteTB,BaddooM,HedgesDJ,ServantG,etal.A4comprehensiveapproachtoexpressionofL1loci.NucleicAcidsRes.2017;45:e31–e31.529.JinY,TamOH,PaniaguaE,HammellM.TEtranscripts:apackageforincluding6transposableelementsindifferentialexpressionanalysisofRNA-seqdatasets.7BioinformaOxfEngl.2015;31:3593–9.830.BrittenRJ.Mobileelementsinsertedinthedistantpasthavetakenonimportant9functions.Gene.1997;205:177–82.1031.LiB,DeweyCN.RSEM:accuratetranscriptquantificationfromRNA-Seqdatawithor11withoutareferencegenome.BMCBioinformatics.2011;12:323.1232.YangWR,ArdeljanD,PacynaCN,PayerLM,BurnsKH.SQuIRErevealslocus-specific13regulationofinterspersedrepeatexpression.NucleicAcidsRes.2019;47:e27–e27.1433.Doucet-O’HareTT,RodićN,SharmaR,DarbariI,AbrilG,ChoiJA,etal.LINE-115expressionandretrotranspositioninBarrett’sesophagusandesophagealcarcinoma.16ProcNatlAcadSci.2015;112:E4894.1734.LangfelderP,HorvathS.WGCNA:anRpackageforweightedcorrelationnetwork18analysis.BMCBioinformatics.2008;9:559.1935.DesaiN,SajedD,AroraKS,SolovyovA,RajurkarM,BledsoeJR,etal.Diverse20repetitiveelementRNAexpressiondefinesepigeneticandimmunologicfeaturesof21coloncancer.JCIInsight.2017;2:e91078.2236.SolovyovA,VabretN,AroraKS,SnyderA,FuntSA,BajorinDF,etal.GlobalCancer23TranscriptomeQuantifiesRepeatElementPolarizationbetweenImmunotherapy24ResponsiveandTCellSuppressiveClasses.CellRep.2018;23:512–21.2537.MenendezL,BenignoBB,McDonaldJF.L1andHERV-Wretrotransposonsare26hypomethylatedinhumanovariancarcinomas.MolCancer.2004;3:12.2738.KuleshovMV,JonesMR,RouillardAD,FernandezNF,DuanQ,WangZ,etal.Enrichr:28acomprehensivegenesetenrichmentanalysiswebserver2016update.NucleicAcids29Res.2016;44:W90–7.3039.ChoiJ,HwangS-Y,AhnK.InterplaybetweenRNASEH2andMOV10controlsLINE-131retrotransposition.NucleicAcidsRes.2018;46:1912–26.3240.delaRicaL,DenizÖ,ChengKCL,ToddCD,CruzC,HouseleyJ,etal.TET-dependent33regulationofretrotransposableelementsinmouseembryonicstemcells.GenomeBiol.342016;17:234.3541.MacfarlanT,KutneyS,AltmanB,MontrossR,YuJ,ChakravartiD.HumanTHAP7Isa36Chromatin-associated,HistoneTail-bindingProteinThatRepressesTranscriptionvia37RecruitmentofHDAC3andNuclearHormoneReceptorCorepressor.JBiolChem.382005;280:7346–58.3942.KoizumiS,IrieT,HirayamaS,SakuraiY,YashirodaH,NaguroI,etal.Theaspartyl40proteaseDDI2activatesNrf1tocompensateforproteasomedysfunction.DikicI,editor.41eLife.2016;5:e18357.4243.SunX,WangX,TangZ,GrivainisM,KahlerD,YunC,etal.Transcriptionfactor43profilingrevealsmolecularchoreographyandkeyregulatorsofhumanretrotransposon44expression.ProcNatlAcadSci.2018;115:E5526.4544.ImbeaultM,HelleboidP-Y,TronoD.KRABzinc-fingerproteinscontributetothe46evolutionofgeneregulatorynetworks.Nature.2017;543:550–4.47
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 38: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/38.jpg)
38
45.TheENCODEProjectConsortium,DunhamI,KundajeA,AldredSF,CollinsPJ,Davis1CA,etal.AnintegratedencyclopediaofDNAelementsinthehumangenome.Nature.22012;489:57.346.ScottEC,GardnerEJ,MasoodA,ChuangNT,VertinoPM,DevineSE.AhotL14retrotransposonevadessomaticrepressionandinitiateshumancolorectalcancer.5GenomeRes.2016;26:745–55.647.OkudairaN,IijimaK,KoyamaT,MinemotoY,KanoS,MimoriA,etal.Inductionof7longinterspersednucleotideelement-1(L1)retrotranspositionby6-formylindolo[3,2-8b]carbazole(FICZ),atryptophanphotoproduct.ProcNatlAcadSci.2010;107:18487–918492.1048.StribinskisV,RamosKS.ActivationofHumanLongInterspersedNuclearElement111RetrotranspositionbyBenzo(a)pyrene,anUbiquitousEnvironmentalCarcinogen.12CancerRes.2006;66:2616–2620.1349.TerasakiN,GoodierJL,CheungLE,WangYJ,KajikawaM,KazazianHHJr,etal.In14VitroScreeningforCompoundsThatEnhanceHumanL1Mobilization.PLOSONE.152013;8:e74629.1650.Banaz-YaşarF,GedikN,KarahanS,Diaz-CarballoD,BongartzBM,ErgünS.LINE-117RetrotranspositionEventsRegulateGeneExpressionAfterX-RayIrradiation.DNACell18Biol.2012;31:1458–67.1951.FarkashEA,KaoGD,HormanSR,PrakETL.Gammaradiationincreases20endonuclease-dependentL1retrotranspositioninaculturedcellassay.NucleicAcids21Res.2006;34:1196–204.2252.GiorgiG,MarcantonioP,DelReB.LINE-1retrotranspositioninhuman23neuroblastomacellsisaffectedbyoxidativestress.CellTissueRes.2011;346:383–91.2453.VanMeterM,KashyapM,RezazadehS,GenevaAJ,MorelloTD,SeluanovA,etal.25SIRT6repressesLINE1retrotransposonsbyribosylatingKAP1butthisrepressionfails26withstressandage.NatCommun.2014;5:5011.2754.JungH,ChoiJK,LeeEA.ImmunesignaturescorrelatewithL1retrotranspositionin28gastrointestinalcancers.GenomeRes[Internet].2018;Availablefrom:29http://genome.cshlp.org/content/early/2018/07/03/gr.231837.117.abstract3055.ChiappinelliKB,StrisselPL,DesrichardA,LiH,HenkeC,AkmanB,etal.Inhibiting31DNAMethylationCausesanInterferonResponseinCancerviadsRNAIncluding32EndogenousRetroviruses.Cell.2015;162:974–86.3356.RouloisD,LooYauH,SinghaniaR,WangY,DaneshA,ShenSY,etal.DNA-34DemethylatingAgentsTargetColorectalCancerCellsbyInducingViralMimicryby35EndogenousTranscripts.Cell.2015;162:961–73.3657.HaffnerMC,TaheriD,Luidy-ImadaE,PalsgroveDN,EichM-L,NettoGJ,etal.37Hypomethylation,endogenousretrovirusexpression,andinterferonsignalingin38testiculargermcelltumors.ProcNatlAcadSci.2018;115:E8580.3958.MoldovanJB,MoranJV.TheZinc-FingerAntiviralProteinZAPInhibitsLINEandAlu40Retrotransposition.PLOSGenet.2015;11:e1005121.4159.NajafabadiHS,MnaimnehS,SchmitgesFW,GartonM,LamKN,YangA,etal.C2H242zincfingerproteinsgreatlyexpandthehumanregulatorylexicon.NatBiotechnol.432015;33:555–62.4460.WolfD,GoffSP.EmbryonicstemcellsuseZFP809tosilenceretroviralDNAs.Nature.452009;458:1201–4.4661.JacobsFMJ,GreenbergD,NguyenN,HaeusslerM,EwingAD,KatzmanS,etal.An47evolutionaryarmsracebetweenKRABzinc-fingergenesZNF91/93andSVA/L148retrotransposons.Nature.2014;516:242–5.49
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 39: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/39.jpg)
39
62.RoweHM,TronoD.Dynamiccontrolofendogenousretrovirusesduring1development.Virology.2011;411:273–87.263.TronoD.TransposableElements,PolydactylProteins,andtheGenesisofHuman-3SpecificTranscriptionNetworks.ColdSpringHarbSympQuantBiol.2015;80:281–8.464.RoadmapEpigenomicsConsortium,KundajeA,MeulemanW,ErnstJ,BilenkyM,Yen5A,etal.Integrativeanalysisof111referencehumanepigenomes.Nature.2015;518:317.665.WangK,SinghD,ZengZ,ColemanSJ,HuangY,SavichGL,etal.MapSplice:Accurate7mappingofRNA-seqreadsforsplicejunctiondiscovery.NucleicAcidsRes.82010;38:e178–e178.966.AndersS,HuberW.Differentialexpressionanalysisforsequencecountdata.10GenomeBiol.2010;11:R106.1167.KadotaK,NishiyamaT,ShimizuK.Anormalizationstrategyforcomparingtagcount12data.AlgorithmsMolBiol.2012;7:5.1368.HuberW,vonHA,SueltmannH,PoustkaA,VingronM.Parameterestimationforthe14calibrationandvariancestabilizationofmicroarraydata.StatApplGenetMolBiol.152003;2:Article3.1669.GuZ,EilsR,SchlesnerM.Complexheatmapsrevealpatternsandcorrelationsin17multidimensionalgenomicdata.Bioinformatics.2016;32:2847–9.1870.CoverTM,ThomasJA.ElementsofInformationTheory.NewYork:Wiley&Sons;191991.2071.CaracausiM,PiovesanA,AntonarosF,StrippoliP,VitaleL,PelleriMC.Systematic21identificationofhumanhousekeepinggenespossiblyusefulasreferencesingene22expressionstudies.MolMedRep.2017;16:2397–410.2372.EisenbergE,LevanonEY.Humanhousekeepinggenes,revisited.TrendsGenetTIG.242013;29:569–74.25
26 27
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 40: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/40.jpg)
40
Figures1
Figure1.ComparisonofreadalignmentonfulllengthL1HSwithmulti-2mappinganduniquemapping.3
ReadsmappedtotwodifferentfulllengthL1HSelementsfromastomachtissuesample4
(A4GY)visualizedthroughIGV.a.L1HS_dup967,a5799ntlengthelementon5
chromosomeX:75453754-75459553.b.a6225ntlengthelementL1HS_dup924on6
chromosomeX:11953208-11959433.Chromosomallocations,48basemappability7
calculatedwithGEM,Bowtie1alignmentallowingonlyuniquelymappedreadswith8
singlebesthit,STARalignmentallowingmulti-mappedreadsupto200mappingfor9
eachread,geneannotationandRepeatmaskerTEannotationareshownfromtopto10
bottom.RedlinesmarktheboundaryoftheL1HSelementswith5’and3’noted.11
12
Figure2.RelationshipbetweenreadcountsforeachL1HSlocus,andthe13mappabilityofthelocus.14
Log2transformedtotalreadcountsmappedtoeachL1HSlocusinthehg19genome,15
summedacrossallsamplesinourdatasetareplottedagainstthetotaluniquely16
mappablepositions(numberofpositionswithmappabilityscore=1basedon48bp17
mappabilitycalculatedwithGEM)foreachlocus.L1HSlociwithzeroreadcountsare18
markedat-1insteadof–infinity.a.readcountsfromBowtie1alignmentwithuniquely19
mappedreadsonlyb.readcountsfromSTARalignmentallowingmulti-mappedreads20
upto200.c.comparisonofreadcountsforeachL1HSlocusbetweentheuniquely21
mappedreadsandmulti-mappedreads.22
23
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 41: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/41.jpg)
41
Figure3.Tissueclusteringbasedontransposableelements1
Heatmapshowingthetissueclusteringresultsbasedonthetop150TEswiththelargest2
varianceofeachclass.Withthecolorbarsatthetopoftheheatmap,theuppercolor3
labelsshowtissuetypesandthelowercolorlabelsshowbroadertissuegroupings.a-d.4
clusteringbasedonfamilylevelquantification.e-h.clusteringbasedonlocuslevel5
quantification.i.ClusteringqualitymeasuredbyMutualInformationforclustering6
resultsbasedonfamilylevelTEquantification,locuslevelquantificationforTEs100Kb7
awayfromgenes,locuslevelquantificationforTEs1Kbawayfromgenes,andlocus8
levelTEquantificationwithoutfilteringforgeneproximity.9
10
Figure4.representativetransposableelementlocithatshowtissue-specific11expression12
15TElociwithtissue-specificexpressionwasidentifiedasrepresentativeexamples13
amongintergenicTEsthatare100Kbawayfromstartandendofgenes.Heatmapcolor14
reflectsthez-scoreofnormalizedlog2readcountsacrosssamples.15
16
Figure5.VariationinTEexpressionacrosstissuesandindividuals17
a.Normalizedandlog-transformedsumofallreadcountsmappingtotheTEsofeach18
class,LINE,DNA,SINEandLTRareshownasviolinplots.Meanreadcountofasetof19
housekeepinggenes(SupplementaryTable6)areplottedasareference.Horizontalline20
acrosstheviolinplotrepresentsthemedianvalueacrossallsamplesofthattissuetype.21
b.Normalizedandlog-transformedsumofallreadcountsmappingto300bpatthe5’22
endofL1HSareplottedasaviolinplot.Samesetofhousekeepinggenesusedina.are23
plottedasareference.24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 42: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/42.jpg)
42
Figure6.co-expressionmoduleswithimmunegenesarenegatively1correlatedwithTEmodules.2
a.enrichedannotationsforgenesbelongingtomoduleM33identifiedintheReactome3
database.b.enrichedannotationsforgenesbelongingtomoduleM35identifiedinthe4
Reactomedatabase.c.top30geneswithhighestmodulemembership(highcorrelation5
withrepresentativeprofileofthemodule)foreachmodule.d-e.correlationplot6
showinghighwithingroupcorrelationandnegativebetweengroupcorrelation7
betweentheTEsintheTEmodulesandtheimmunegenesinmodulesM33andM35.d.8
datafrombreast,ande.datafromesophagus.Colorlabelontopofthecorrelationplot9
showdifferentclassesofTEs,andgenesthatareannotatedwiththeGOterm“immune10
systemprocess”.11
Figure7.co-expressionmoduleswithmitochondrialgenesandribosome12genesarenegativelycorrelatedwithintronicTEs.13
a.enrichedannotationsforgenesbelongingtomoduleN4identifiedintheReactome14
database.b-c.correlationplotshowinghighwithingroupcorrelationandnegative15
betweengroupcorrelationbetweentheTEsintheintronicTEmoduleN1,andthe16
mitochondrialandribosomalgenesinmodulesN4.b.showsdatafrombreast,andc.17
showsdatafromesophagus.Colorlabelontopofthecorrelationplotshowdifferent18
classesofTEs,andgenesthatareannotatedwiththeGOterm“mitochondrion”and19
“ribosome”.20
Figure8.Genesthatshowpositiveandnegativecorrelationwiththe21transcriptlevelofL1HS5’end.22
GenethatshowsignificantpositiveandnegativecorrelationwithL1HS5’inmultiple23
tissues.Esophagusandstomacharecombinedasonetissuegroup.24
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 43: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/43.jpg)
43
Figure9.Overlapbetweenco-expressedKZFP-TEfamilypairsandTEfamilies1enrichedforKZFPbinding.2
a.KZFP-TEfamiliesco-expressedinatleasttwotissuesaremarkedwithpink,TE3
familiesenrichedforKZFPbindingaremarkedwithyellow,andTEfamiliesthatare4
bothboundbyKZFPandco-expressedwithsameKZFParemarkedwithgreen.KZFPs5
thatshowoverlapofbindingandco-expressionformultipleTEfamiliesarelabeled6
alongtheverticalaxis.b-c.categorizationofco-expressionandKZFPbindingforall7
200,889KZFP-TEfamilypair-wisecombinations(221KZFPx909TEfamilies),that8
havebothexpressionandChIP-exodata.b.countsco-expressionsignificantinatleast9
onetissue.c.countsco-expressionsignificantinatleasttwotissues.10
11
Figure10.OverlapbetweenexpressionandKZFPbindingforTEloci12
a.categorizationofall4,496,028TElociannotatedinhg19RepeatMaskerbyexpression13
andKZFPbinding.b.OverlapofexpressionandKZFPbindingwithENCODECandidate14
RegulatoryElementmarks.c.ProportionofeachcategoryofTEsthataremarkedwith15
ENCODECandidateRegulatoryElementmarks.16
17
18
19
20
21
22
23
24
25
26
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 44: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/44.jpg)
44
Tables1
Table1.Transposonlocithatshowlargedifferenceaftercorrectingforpre-2mRNA/retainedintrons.3
a.Transposonlociembeddedwithinintronsorexonsofgenesthatfrequentlyresultin4
thelargestcorrectionineachsample.Locusid,genomiclocation,surroundinggeneand5
structuretheTEisembeddedin,andthemaximumnumberofreadsremovedina6
sample.7
locus chr start end surrounding gene
TE embedded in
# of samples
Max correction
MIRc_dup47590 8 22021288 22021431 SFTPC Intron 4, Exon 5 105 428021
AluY_dup80589 12 69747275 69747567 LYZ Exon 4 62 313317
MIRb_dup137684 10 81315669 81315913 SFTPA2 Exon 5 106 266566
MIRb_dup137689 10 81374907 81375150 SFTPA1 Exon 5 106 230394
MIR3_dup57107 10 81316603 81316678 SFTPA2 Exon 5 103 90241
AluSz6_dup3320 1 207102295 207102608 PIGR Exon 11 124 59130
MIRc_dup74805 12 50351953 50352157 AQP2 Exon 4 109 57581
LTR39_dup404 6 160102172 160102969 SOD2 Exon 4, Intron 7 119 34545
AluSx1_dup59209 Y 21153222 21153521 TTTY14 Exon 1 305 25867
AluJb_dup119100 17 16344881 16345132 C17orf76-AS1 Intron 4, Exon 5 253 21915
8 9
10
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 45: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/45.jpg)
45
Table2.KZFPgenemembersincoreTEmodules1
KZFPgenesthataremembersofthecoreTEmodules,andmoduleM3thatiscorrelated2
withcoreTEmodules.3
4
core TE modules KZFP chromosome KZFPs in correlated module M3 chromosome
M8
HKR1 19 KZFP169 9 KZFP226 19 KZFP202 11 KZFP682 19 KZFP266 19 KZFP789 7 KZFP300 5 KZFP814 19 KZFP320 19
M21
KZFP404 19 KZFP431 19 KZFP418 19 KZFP439 19 KZFP589 3 KZFP44 19 KZFP75A 19 KZFP587 19
M38 KZFP117 7 KZFP662 3
M45
KZFP334 20 KZFP7 8 KZFP493 19 KZFP700 19 KZFP506 19 KZFP708 19 KZFP721 4 KZFP714 19 KZFP737 19 KZFP732 4
KZFP83 19 KZFP841 19
5
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 46: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/46.jpg)
46
Table3.GenemembersintheintronicTEmoduleN1.1
GenesthataremembersoftheintronicTEmoduleN1,thenumberoftissuestheyare2
assignedtomoduleN1in,synonymstogenenames,fullgenenames.3
Gene Symbol tissues Synonyms Full gene name NCRNA00201 7 HNRNPU heterogeneous nuclear ribonucleoprotein U AHSA2 6 AHSA2P activator of HSP90 ATPase homolog 2, pseudogene CCNL2 6 CCNL2 cyclin L2 CG030 5 N4BP2L2-IT2 N4BPL2 intronic transcript 2 FAM13AOS 5 FAM13A-AS1 FAM13A antisense RNA 1 MDM4 5 MDM4 MDM4 regulator of p53 NKTR 5 NKTR natural killer cell triggering receptor SLC25A27 5 UCP4 solute carrier family 25 member 27 ANKRD36 4 ANKRD36 ankyrin repeat domain 36 LOC100190986 4 LOC100190986 uncharacterized LOC100190986 LOC440944 4 THUMPD3-AS1,
SETD5-AS1 THUMPD3 antisense RNA 1
LOC91316 4 GUSBP11 GUSB pseudogene 11 LUC7L 4 LUC7L LUC7 like LUC7L3 4 LUC7L3 LUC7 like 3 pre-mRNA splicing factor NCRNA00105 4 ASMTL-AS1 ASMTL antisense RNA 1 OGT 4 OGT O-linked N-acetylglucosamine (GlcNAc) transferase SEC31B 4 SEC31B SEC31 homolog B, COPII coat complex component KZFP789 4 KZFP789 zinc finger protein 789
4
5
6
7
8
9
10
11
12
13
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 47: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/47.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 48: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/48.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 49: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/49.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 50: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/50.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 51: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/51.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 52: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/52.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 53: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/53.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 54: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/54.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 55: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/55.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint
![Page 56: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength](https://reader035.fdocuments.in/reader035/viewer/2022071610/614a13a912c9616cbc692e1a/html5/thumbnails/56.jpg)
.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint