Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain...

56
1 Transcriptome analyses of tumor-adjacent somatic tissues 1 reveal genes co-expressed with transposable elements 2 Nicky Chung 1,* , G.M. Jonaid 1,* , Sophia Quinton 1,* , Austin Ross 1,* , Corinne E. Sexton 1 , 3 Adrian Alberto 2 , Cody Clymer 2 , Daphnie Churchill 2 , Omar Navarro Leija 2 and Mira V. 4 Han 1,3,§ 5 1 School of Life Sciences, University of Nevada, Las Vegas, NV 89154, USA 6 2 Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA 7 3 Nevada Institute of Personalized Medicine, Las Vegas, NV 89154, USA 8 9 * These authors contributed equally to this work 10 § Corresponding author 11 Email addresses: 12 NC: [email protected] 13 GMJ: [email protected] 14 SQ: [email protected] 15 AR: [email protected] 16 CS: [email protected] 17 AA: [email protected] 18 CC: [email protected] 19 DC: [email protected] 20 ON: [email protected] 21 MVH: [email protected] 22 . CC-BY-NC-ND 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted June 4, 2019. ; https://doi.org/10.1101/385062 doi: bioRxiv preprint

Transcript of Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain...

Page 1: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

1

Transcriptomeanalysesoftumor-adjacentsomatictissues1

revealgenesco-expressedwithtransposableelements2

NickyChung1,*,G.M.Jonaid1,*,SophiaQuinton1,*,AustinRoss1,*,CorinneE.Sexton1,3

AdrianAlberto2,CodyClymer2,DaphnieChurchill2,OmarNavarroLeija2andMiraV.4

Han1,3,§5

1SchoolofLifeSciences,UniversityofNevada,LasVegas,NV89154,USA6

2DepartmentofComputerScience,UniversityofNevada,LasVegas,NV89154,USA7

3NevadaInstituteofPersonalizedMedicine,LasVegas,NV89154,USA8

9

*Theseauthorscontributedequallytothiswork10

§Correspondingauthor11

Emailaddresses:12

NC:[email protected]

GMJ:[email protected]

SQ:[email protected]

AR:[email protected]

CS:[email protected]

AA:[email protected]

CC:[email protected]

DC:[email protected]

ON:[email protected]

MVH:[email protected]

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 2: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

2

Abstract1

Background2

Despitethelong-heldassumptionthattransposonsarenormallyonlyexpressedinthegerm-line,recent3evidenceshowsthattranscriptsoftransposableelement(TE)sequencesarefrequentlyfoundinthe4somaticcells.However,theextentofvariationinTEtranscriptlevelsacrossdifferenttissuesanddifferent5individualsareunknown,andtheco-expressionbetweenTEsandhostgenemRNAshavenotbeen6examined.7

Results8

HerewereportthevariationinTEderivedtranscriptlevelsacrosstissuesandbetweenindividuals9observedinthenon-tumoroustissuescollectedforTheCancerGenomeAtlas.WefoundcoreTEco-10expressionmodulesconsistingmainlyoftransposons,showingcorrelatedexpressionacrossbroad11classesofTEs.Despitethisco-expressionwithintissues,thereareindividualTElocithatexhibittissue-12specificexpressionpatterns,whencomparedacrosstissues.ThecoreTEmoduleswerenegatively13correlatedwithothergenemodulesthatconsistedofimmuneresponsegenesininterferonsignaling.14KRABZincFingerProteins(KZFPs)wereover-representedgenemembersoftheTEmodules,showing15positivecorrelationacrossmultipletissues.ButwedidnotfindoverlapbetweenTE-KZFPpairsthatare16co-expressedandTE-KZFPpairsthatareboundinpublishedChIP-seqstudies.17

Conclusions18

WefindunexpectedvariationinTEderivedtranscripts,withinandacrossnon-tumoroustissues.We19describeabroadviewoftheRNAstatefornon-tumoroustissuesexhibitinghigherlevelofTEtranscripts.20TissueswithhigherlevelofTEtranscriptshaveabroadrangeofTEsco-expressed,withhighexpression21ofalargenumberofKZFPs,andlowerRNAlevelsofimmunegenes.22

Keywords23

Transposon,TE,L1HS,RNA-seq,co-expression,mitochondria,KRABzincfinger24

Background25

Althoughtransposableelements(TEs)havebeenstudiedforalongtime,their26

ubiquitousandhighlytissue-specificexpressionpatternsarestartingtobeappreciated27

onlyrecently.ThefactthatTEscomposecloseto40%ofthehumangenomeis28

frequentlyemphasized,butthefactthatthereisobservableamountofTEderived29

transcriptsinhumanRNA-seqdatahasmostlybeenignoredorregardedasanuisance30

withoutanyfunctionalrelevance[1].LINEelementshavelongbeenthoughttobe31

expressedonlyinthegermlinecells[2–4].But,bothfull-lengthandpartialtranscriptsof32

LINEsarefrequentlyfoundinthesomaticcells[4–6]withlargevariationinexpression33

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 3: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

3

levelsacrosstissuetypes,andamongdifferentindividuals[7].ThelevelofTE1

expressionisespeciallypronouncedincancercells[8–11],andcelllines[12],butare2

alsoobservedinneurogenesis[13]andnormalsomatictissue.Faulkneretal.in2009,3

wasthefirststudytoprovideaglobalpictureofthesignificantcontributionof4

retrotransposonstohumantranscriptomeinmultipletissuetypes[14].Thisreport5

showedthat6-30%oftranscriptshadtranscriptionstartsiteslocatedwithin6

transposons,andthesetransposonswereexpressedinatissue-specificmannerand7

influencedthetranscriptionofnearbygenes.TheresultswereextendedbyDjebalietal.8

in2012showingagainthetissue-specificityoftransposonexpression,andthatmostof9

thesetranscriptsarefoundinthenuclearpartofthecell[15].Inadditiontothetissue-10

specificexpressionofTEs,importantregulatoryrolesforTEsareemerging(reviewedin11

[16]).Observationsincludecontributiontotranscriptionstartsites[14],sourceof12

transcriptionfactorbindingsites[17],sourceoflongnon-codingRNAs[18],active13

transcriptionduringearlydevelopment[19],andevencriticalfunctionsimilartolong14

non-codingRNAsthatguidechromatin-remodelingcomplexestospecificlociinthe15

genome[20].16

AlthoughtherearemanyreportsofTEexpressioninthesomaticcells,thereisstilla17

largegapinourunderstandingofhowTEexpressionisrepressedandde-repressedin18

humansomaticcells.Basedonwhatwehavelearnedsofar,TEexpressionisregulated19

throughmultiplelayers,consistingoftranscriptionfactors,epigeneticmodification,20

PIWI-interactingRNAs(piRNAs),RNAinterference(RNAi),andposttranscriptionalhost21

factors.Recently,twodifferentapproachesofgenome-widescreeninghaveidentified22

proteinsthatregulatedifferentaspectsoftheactivitiesofLINEelements.CRISPR–Cas923

screenwasusedtoidentifyproteinsthatrestrictLINEactivity[21].TheproteinMORC224

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 4: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

4

andthehumansilencinghub(HUSH)complexwasshowntoselectivelybind1

evolutionarilyyoung,full-lengthLINEslocatedwithineuchromaticenvironments,and2

promotedepositionofhistoneH3Lys9trimethylation(H3K9me3)fortranscriptional3

silencing[21].Andthroughproteomicsapproaches,twostudieshaverecentlyidentified4

thelocalizationofORF1andORF2proteinsanditsinteractingpartners[22],andthe5

timingoftheentranceoftheORF2proteincomplexintothenucleus[23].But,nostudy6

hasyetexaminedthecorrelationintranscriptlevelsofhostmRNAandtransposonRNA.7

Recently,high-throughputRNA-seqdataofvarioustypesofcancersamplesandtheir8

normalcounterpartshavebecomeavailableinTheCancerGenomeAtlas(TCGA)[24–9

26].Byfocusingonthenon-tumoroustissuesamplesfromTCGA,wecanaccess10

thousandsofnaturalexperimentsacrossvarioustypesoftissuesthatshowvariationin11

TEtranscriptlevels,andobtainaglobalpictureofTEexpressionandregulationin12

humans.AnimportantstrengthoftheTCGAdatasetisthelargenumberofsamples13

collectedforeachtissuetypeandthehighdepthoftheRNA-seqexperiment,witha14

medianofabout150Mreadspersample,whichisseveraltimeslargerthanausual15

RNA-seqlibrary.ThevariationinTEtranscriptlevelsobservedinmultiplesamples16

withineachtissue,allowedustoanalyzetheco-expressionpatternsbetweenhostgenes17

andTEsforthefirsttime.Wehypothesizedthatgenesthatregulatethetranscription18

levelofTEswouldshowcorrelationinexpressionlevelswiththeTEtranscripts.Since19

thesamplesarecollectedfromfresh-frozentissues,TEtranscriptlevelsareobservedin20

vivo,complementingthestudiesthatfocusonretrotranspositionassaysortransposon21

expressioninhumancelllines.22

WefirstsummarizethesurveyofTEexpressionvariationfoundintheRNA-seqdata23

from697samplesofcancer-adjacentnon-tumoroustissue.Weconfirmtheearlier24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 5: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

5

findingsthatTEexpressionvariesacrosstissuetypes.TranscriptlevelsofindividualTE1

lociarehighlytissuespecificandwithineachfamilyonlyafewindividuallociarehighly2

expressed,contributingtothebulkofthetransposontranscriptsatthefamilylevel.We3

alsofindlargevariationintotalTEtranscriptlevelacrossindividualsampleswithin4

eachtissuetype.5

Although,transposonshavestrongtissue-specificpatternsatthelocuslevel,wealso6

foundthatthemajorityofTEsshowglobalco-expressionatthefamilylevelacross7

samples.Byanalyzingtheco-expressionbetweentheseTEsandindividualgenes,we8

foundco-expressionmodulesofTEsandgenesreplicatedacrosstissues.9

Results10

TEderivedtranscriptsarequantifiedacross16tissuesand697samplesof11tumoradjacentcontrols.12

Were-alignedandquantifiedTEderivedtranscriptsfromtheRNAsequencingdataof13

697samplesacross16tissuescollectedasnon-tumorouscontrolsfortheTCGAproject14

(supplementaryTable1).Thelibrarysizesforthesesamplesrangefrom50Mreadsat15

theminimum,toupto390Mreads,withamedianatabout149Mreads(75Mpairs).16

AlthoughalltissuesincludedinthisstudyweresequencedusingtheHiSeq200017

platform,esophagusandstomachsamplesweresequencedseparatelyatBritish18

ColumbiaGenomeSciencesCentre(BCGSC),withhighersequencingdepthonaverage19

(median227Mreads).Theproportionofreadsthatdonotmaptoannotatedgeneswere20

differentbetweenthelatersamplessequencedatUniversityofNorthCarolinaatChapel21

Hill(UNC),andtheearlierBCGSCsequencedsamples,withBCGSCsampleshavingmore22

reads(median177M)notmappingtoannotatedgenesanddiscarded,whileUNC23

sampleshadlessreadsdiscarded(median97M),possiblyduetothedifferenceinpoly-A24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 6: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

6

enrichmentprotocol(MultiMACSmRNAisolationkitvs.TruSeqRNALibraryPrepKit1

[27])amongmanyotherdifferences,includingreadlength.Becauseofthesedifferences,2

whencomparingacrosstissuesamples,wehadtoconsideresophagusandstomach3

tissuesseparately,andtheycouldnotbecomparedagainsttherestofthetissues.4

Despitethedifferencesintheoverallsequencingdepthandoverallproportionofreads5

mappingtogenes,wefoundthattheDESeq2normalizationmethodnormalizesthe6

readseffectivelyandthecorrelationduetolibrarysizedisappearsafternormalization7

withintissues(SupplementaryFigure1).Ourco-expressionanalysisisdonewithineach8

tissueseparately.Wealsoreplicateourresultsfoundinesophagusandstomachwith9

similarresultsfoundinatleastoneothertissue.10

AlthoughwefindreadsmappingtoTEsinallthesamplesthatwehaveexamined,the11

overalltranscriptscomingfromTEsarestillarelativelytinyproportionofthetotal12

library.ThetotalnumberofreadsmappingtoTEsrangedfrom137Kto2.1Mwitha13

medianof615KforUNCsamples,andrangedfrom282Kto3.3Mwithamedianof835K14

forBCGSCsamples.Thiscountexcludessomeofthepotentialread-thrutranscriptsas15

describedinthenextsection.ThetotalreadcountsacrossallTEsamounttoabout1.2%16

(UNC)or1.7%(BCGSC)ofthetotalreadsmappingtoknowngenesinnon-tumorous17

tissuesamples.18

TEreadsoriginatingfrompre-mRNAsorretainedintronsarecorrectedby19comparingthereaddepthsoftheflankingintrons.20

Therehavebeenpreviousreportsoftransposonreadscomingfrompre-mRNAor21

retainedintronsinthematureRNAofgenesthatcontainTEsequencesintheirintrons22

[28].Theextentofthisproblemcanbepartiallyestimatedbycomparingtheread23

depthsofthetransposontothereaddepthsoftheflankingintrons.Ifthereadsmapping24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 7: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

7

toTEsarepartofthepre-mRNAorretainedintrons,weshouldseecontinuousmapping1

ofreadsthatspantheintronsflankingtheTEofinterest,andobservereadsthatmap2

acrosstheintron-TEboundaries.Wecanalsopartiallycorrectforthisproblemby3

utilizingthereaddepthsintheflankingintronstoproportionallyreducethenumberof4

totalreadsmappedtoTEs.Theapproachisdescribedbelow.5

𝑅𝐼𝐿 =𝑐𝑜𝑢𝑛𝑡𝐼𝐿

𝑙𝑒𝑛𝐼𝐿 − 𝑟𝑒𝑎𝑑_𝑙𝑒𝑛6

𝑐𝑜𝑢𝑛𝑡123 = 𝑐𝑜𝑢𝑛𝑡12 − 𝑐𝑜𝑢𝑛𝑡12×𝑅67 + 𝑅692𝑅12

, 𝑖𝑓𝑅67 + 𝑅692𝑅12

< 1

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒7

𝑇𝐸:focalTE8𝐼𝐿:intronlefttoTE.𝐼𝑅:intronrighttoTE.9𝑅67:readdepthoftheintronlefttotheTE10𝑐𝑜𝑢𝑛𝑡67:readcountsmappedtotheintronlefttotheTE(includesmulti-mapped11reads)12𝑙𝑒𝑛67:lengthoftheleftintron13𝑟𝑒𝑎𝑑_𝑙𝑒𝑛:lengthofthesequencingread14𝑐𝑜𝑢𝑛𝑡12′:countofreadsmappedtoTEafterthecorrection.15

WemodifiedthesoftwareTEtranscripts[29],followingthisapproach,todiscountthe16

TEreadcountsbasedonthereaddepthsofthesurroundingintrons.Bylookingfor17

largedifferencesaftercorrectingbyflankingreaddepth,weidentifiedTEsthataremost18

frequentlytranscribedaspartoftheintrons(Table1).Wealsofoundcaseswherethe19

methodcorrectedforerroneousTEquantificationsduetoTEsembeddedwithinlong20

non-codingRNAs(lncRNAs).Forexample,anAluSx1elementonchromosomeYat21

position21153222(AluSx1_dup59209)hadveryhightranscriptlevelswithanaverage22

readcountof18863inthyroidandheadandnecktissue,buttheAluelementis23

embeddedinalncRNAgenecalledTTTY14.Thereadsmappingherearecountedas24

AluSx1transcriptsbasedontheUCSCTEannotation,butinthealignment,weseethat25

therearereadsspanningtheboundariesofAluSx1_dup59209,andalmostallthereads26

mappedintheregionareuniquelymappedreads.ItlookstobeacaseofanAlu27

domestication,whereanAluinsertionorasecondaryduplicationofanoriginalAlu28

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 8: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

8

insertionbecamepartofatestisspecificRNAgene[30].ThreeexamplesofAluSx1,L2a,1

andL1MA7,wherethereadcountsforthetransposonsarereducedtozeroare2

visualizedinSupplementaryFigure2.AluSx1_dup59209(chrY:21153222-21153521)is3

embeddedwithinanexonofgeneTTY14.L2a_dup21781(chr2:113980079-113981081)4

isembeddedinanintronofPAX8.L1MA7_dup4297(chr8:134015602-134015763)is5

embeddedinanintronofgeneTG.Inallthreecases,readcountsforthefocalTEswere6

reducedtozeroafterthecorrectiondescribedabove,andthereadsmappingtothese7

TEsdidnotcontributetotheoverallTEfamilycount.Ifoneisinterestedintransposable8

elementtranscriptlevelthatisnotpartofalongerRNAmolecule,itisimportanttotake9

intoaccountthereaddepthsoftheflankingintronsorexons,especiallythelatestnon-10

codingRNAgeneannotations,whenquantifyingrepeatelementtranscriptsinthe11

genome.12

Relyingonuniquelymappedreadsforrepeatquantificationresultsin13quantificationbiasedformappableelements.14

Duetothedifficultyofmappingreadstorepeatelements,oneoftheapproachestaken15

istocountonlythereadsthatmaptoauniquepositioninthegenome.Butthis16

approachhasrepeatedlybeenshowntoproduceresultsthatareworsethan17

expectation-maximization[31,32],andcanleadtoseriousbiases.Ifweonlycount18

uniquelymappedreadsinouranalysis,notonlydidwethrowawayfrom10.7%toupto19

45%(median14.2%)ofthetotalTEtranscripts,wethrewawaydatainabiased20

manner,suchthatweendedup“quantifyingmappability”insteadof“quantifying21

transcripts”.Thisproblemisespeciallypronouncedwhenquantifyingtheyoungand22

activeL1HSelement.Toassesstheeffectofalignmentonquantification,wetriedtwo23

differentalignments,onebasedontheSTARalignerwithupto200multi-mapped24

positions,andtheotherbasedonBowtie1withonlyasinglebestalignmentposition,25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 9: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

9

discardingallreadsthatdonothaveauniquebestmapping.Figure1showstwocases1

thatillustratethelimitationsofeitheroftheapproaches.Figure1a.showsanexample2

ofafulllengthL1HSlocusonchromosomeX:75453754-75459553withhigh3

mappability(48basemappabilityshownatthetopofthepanel)duetomany4

accumulatedmutationsinitssequence.Thetoppanelshowsthebowtie1alignment,5

allowingonlyuniquelymappingreadswithasinglebesthit,andthebottompanel6

showstheSTARalignmentwithmulti-mappingupto200mappingsperread.Inthe7

STARalignment,wecanseeerroneouslysplitreadalignmentsatthe3’endthatresult8

inreadsmappingacrossgreaterthan10Kdistances,thatshowsalimitationofasplicing9

orientedalignmentsoftware.Thetranscriptionforthiselementdoesnotstartatthe5’10

endofthefulllength,butthereisclearandunambiguoustranscriptionstartingfrom11

about1500basesin,thatarecongruentbetweenbothalignments.InFigure1b.it12

showsanotherfulllengthL1HSlocusonchromosomeX:11953208-11959433,this13

timeayoungelementwithverylowmappability.Comparingthetopandbottompanels,14

wecanseethatwiththeuniquemappingweareignoringallthereadsthatareperfectly15

mappingtothislocus,butalsomaptomultipleotherlocations.Thereisahugepile-up16

atthe5’endofthefulllength.Ifwelookatthereadsmappingtothe5’endofthislocus,17

theirNHtagsshownumbersrangingfrom2to4,meaningthattheyaremappingtotwo18

tofouralternativelocationsinthegenome.ConsideringthatL1HSlocicontainingthe5’19

endsaremorelikelytobefulllengthelements,thesereadsaremorelikelytobecoming20

fromoneofthefewfulllengthL1HSlociinthegenome,but,weendupignoringthese21

readsifweareonlycountinguniquelymappingreads.Ontheotherhand,withmulti-22

mapping,weendupquantifyingwithlargeuncertaintyonwhetherthereadspiledupin23

thisregionarereallytranscribedfromthisparticularlocus.Thisisevidentbythesmall24

regionsofextremelyhighpile-upsthatreflectfragmentsthatarefoundinthegenome25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 10: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

10

withhighfrequency.Although,weshouldpointoutthatwedon’tcountallthereads1

aligninghereatfacevalue,sincetheexpectationmaximizationalgorithmwilldown2

weighthecountsofreadsbythenumberofplacesitmapsto.3

Comparingthemappabilityofthetwoexamples,wecanseethattheuniquelymapping4

approachpreferentiallycountsreadscomingfromolderTElociwithhigher5

mappability.Thiscanalsobeshownbycorrelatingthelocuslevelreadcountsforeach6

L1HSelementagainstthelengthofuniquelymappablepositionsineachTElocus7

(Figure2a.).Fortheuniquemappingapproach,weseethereissignificantcorrelation8

betweenthelocuslevelquantificationandthetotallengthofuniquelymappable9

positionswithinthatlocus(p-value=5.523e-07forsumofreadcountsacrossall10

samples,andp-value=1.072e-13formaximumreadcountamongallsamples).The11

multimappingapproachwithExpectationMaximizationdoesnotshowthatbiasfor12

uniquelymappableregions(p-value=0.60forsumandp-value=0.08formax)(Figure213

b.).14

Consequently,thereislimitedcorrelationinthelocuslevelquantificationofL1HS15

betweentheuniquelymappedreadsandthemulti-mappedreads(Figure2c.).This16

showsthedifficultyofquantifyingyoungactiveelements,suchasL1HSusinggenome-17

wideRNA-seqdata.Duetotheselimitations,analysisonL1HSinthisstudyhasbeen18

doneatthefamilylevel.ThefamilylevelquantificationofL1HSstillshowsvariability19

basedonthereadmappingapproach(SupplementaryFigure3e.),butitshowsstronger20

correlationthanthelocuslevelquantification.Westillwantedtoutilizetheabundance21

ofRNA-seqdataavailableforstudyingL1HStranscription,andgleaninformationon22

L1HSfromthesedata.Basedontheobservationthat3’endsofL1HSarefrequently23

representedinfragmentedL1HSloci,whilethe5’endsaremorefrequentinfulllength24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 11: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

11

L1HSloci,wedecidedtousethereadcountsofthe5’endoftheelementasameasure1

thatbetterrepresentsthetranscriptleveloffull-lengthL1HStranscriptsinthesample.2

AllthefollowinganalysisonL1HSexpressionarebasedonthereadsmappedtoL1HS3

sequencesinthegenomethatalignwiththefirst300basesofthe5’endoftheL1HS4

consensussequenceandallowingformulti-mapping.5

Ontheotherhand,wefoundthatquantificationofolderelementsshowedverystrong6

correlationbetweenthetwoapproaches,uniquemappingandmulti-mapping,reflecting7

thehighermappabilityofolderelementsinthegenome(SupplementaryFigure3).For8

thelocuslevelco-expressionanalysiswiththeZincFingerProteins,welimitedour9

analysistoolderelementsthatare100%uniquelymappableacrossitssequenceswitha10

48-basereadlength.Allofourmainresultsarequalitativelyreplicatedinthedatawith11

uniquelymappedreadsalignedwithbowtie,exceptfortheresultsregardingthe5’end12

ofL1HSexpression.13

TEexpressionshowstissue-specificexpressionpatternsatthelocuslevel14amongsomatictissues15

TherehavebeenmultiplereportsoftissuespecificexpressionofTEsinthehuman16

genome,startingfromFaulkneretal.in2009[14]toPhilippeetal.2017[12]more17

recently.WealsofoundhighlydistincttissuespecificityinTEtranscriptsintheTCGA18

data,suchthatthatwecouldclustereachsampleintotheirbroadertissuegroupings,19

basedonlocuslevelTEexpressionpatternsalonewithoutrelyingonanygenesatall.20

Figure3showstheclusteringoftissuesforfamilylevelandlocuslevelquantificationof21

LTRs,DNAtransposons,SINEsandLINEs.Weusednormalizedmutualinformation22

betweenthedifferentclusteringresultsandthegroundtruth(thetruetissuegroup)to23

evaluatethequalityofclustering.Normalizedmutualinformationwascomparedfor24

clusteringresultsbasedongeneexpression,familylevelTEexpression,locuslevelTE25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 12: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

12

expressionandrandomassignments.Wefoundthatthelocus-levelTEexpressionwas1

aspredictiveoftissuegroupingsasthegenes(Figure3.i,SupplementaryTable2).LTRs,2

DNAtransposons,LINEsandSINEsgavesimilarclusteringaccuracyasthegenes.The3

TEfamilyexpressionlevelsdidnothaveenoughinformationtoclusterthetissues4

correctly.Wenotehere,thatthesearelociselectedbytherankofvarianceinlog25

normalizedreadcountacrossallsamplesregardlessoftissuetype,andwehaven’tdone6

anydifferentialexpressionanalysistoidentifythemarkersthatarethemost7

informativeforaccuratetissueclassification.Thus,theclassificationperformancewe8

observehereisnottheoptimalperformancethatwecouldgetifweweretodecideon9

themarkersbasedonatrainedclassifier.WhenweexcludedTEsthatarewithin1K,10

10K,and100Kofthestartandendsofthegenes,theaccuracydeclined,sopartofthe11

tissuespecificityisduetoco-locationwithtissuespecificgenes.But,evenwhenrelying12

onTEs100Kawayfromanyknowngenes,wesawthattissuespecificinformationwas13

largelyretained.Ontheotherhand,whenwefocusedonyoungerelements,HERVs14

withinLTRsandyoungL1swithinLINEs,therewasalargereductionininformation15

content,especiallyforyoungL1s.ClusteringbasedonlocuslevelexpressionforL1HS,16

L1PA2andL1PA3wasnotanybetterthanclusteringbasedonfamilylevelexpression17

ofallLINEs.Wesuspectthisisduetothelowerlocuslevelmappabilityandlarge18

uncertaintyinlocuslevelexpressionquantificationforyoungL1elements.Figure419

showsfifteenrepresentativeTElocithatshowtissuespecificexpression.Theselociare20

chosenfromTEsthatare100Kawayfromthestartorendofanygene.21

ThegranularlevelsoflocusspecificTEexpressioncontainedtissue-specific22

information,but,theoveralltranscriptlevelofTEclassesdidnotshowsignificant23

variationacrosstissues(Figure5a.).ThehigherexpressionlevelsforTEsseenfor24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 13: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

13

esophagusandstomachisconfoundedwiththedifferencesinsequencingprotocol1

describedabove,sotheyarenotdirectlycomparabletotherestofthetissues.When2

focusingon300basesinthe5’endofL1HS,itshowedsomevariationacrosstissues3

withhigherlevelsintheheadandnecktissuesandlowerlevelsintheliver,consistent4

withthepreviousobservationinadulthumantissues[6]andinhumancelllines[12],5

albeitwithlargewithintissuevariance(Figure5b.).Although,wecannotdirectly6

compareL1HSexpressioninesophagusandstomachtotherestoftheothertissues,we7

cantellthatthereiscleartranscriptionofthe5’endofL1HSinesophagusandstomach.8

Figure5b.showsthenormalizedreadcountsinlog2scale,withamedianofmorethan9

500readsmappingto300basesatthe5’endofL1HSforesophagusandstomach10

(medianlibrarysize227Mreads).TherehavebeenobservationsoffulllengthL1HS11

expressedintheadultesophagusandstomachtissue,atabout80%and150%relative12

tothelevelsinHeLacells[6],andactiveL1retrotranspositioninpremalignant13

precursorlegionsofesophagealadenocarcinoma[33]14

Co-expressionanalysisofintergenicTEsidentifiescoreTEmodulesand15correlatedZincFingerProteins.16

Co-expressionnetworkanalysisisanappropriateapproachtoexaminetheco-17

expressionacrossdifferentTEfamiliesandhostgenestogether.Inordertoidentifythe18

commongene/TEmodulesthatarecorrelatedacrossdifferenttissues,wedida19

consensusnetworkanalysisacrosstissuesusingtheweightedgeneco-expression20

networkanalysisintheWGCNApackage[34].FortheTEfamilytranscriptsinthis21

analysis,weonlyincludedintergenicTEs,i.e.weonlycountedreadsmappingtoTEs22

thatare1Kbawayfromanystartandendofknowngenes.Weidentified61modules23

across11groupoftissues,combiningcertaintissuetypestogetherasabroadergroup24

(colonandrectum,esophagusandstomach,kidneys,lungs).Amongthe20531genes25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 14: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

14

and992TEfamiliesthatwerequantifiedinthe697samples,18670genesand923TE1

familieshadenoughexpressionlevelandvariationtobeincludedinthenetwork2

analysis.Amongthose19593genesandTEs,9599genesand658TEswereclustered3

intoamoduleofco-expression,while9336didnotbelongtoanydefinedmodule.The4

listofmodules,correlationsbetweenthemodules,andtopologicaladjacencymatrix5

thatdefinesthemodulesarevisualizedforthebreasttissueinSupplementaryFigure4.6

Visualizationforothertissuesweresimilar,aswelookedforconsensusmodulesacross7

alltissues.TherewereonlyafewmodulesthatcontainedTEtranscripts:onlyseven8

modulescontainedmorethantenTEfamilieswithinthemodule.SupplementaryTable9

3showsthedistributionofTEfamiliesinthesesevenmodules.Weconsideredmodules10

M8,M21,M38andM45ascoreTEmodules,astheirmembershipmainlyconsistedof11

TEfamiliesasthemajority(markedby*inSupplementaryFigure4).12

ThecorrelationbetweencloselyrelatedTEsubfamiliesisexpectedbecausereadsfrom13

transposonsthatmaptosequencesthatareindistinguishablebetweensubfamiliesare14

assignedtomultiplesubfamilieswithproportionalweightbyTEtranscriptsusingan15

Expectation-Maximizationalgorithm.CloselyrelatedfamiliessuchasL1HSandL1PAs16

alsosharecommonregulatoryelementsatthe5'end.But,wefindthatthecorrelated17

TEfamiliesinaTEmodulespandifferentclassesofTEs,andarereplicatedevenwhen18

countinguniquelymappedreadsonly.Consideringthereisnosequencesimilarity19

betweentheSINEs,DNAtransposons,LTRsandtheLINEs,thecorrelationamongthese20

diverseclassoftransposonsisprobablyduetoacommonregulation,ordys-regulation,21

thatisde-repressingthesetransposonsatthesametime.Therehavebeenreportsof22

suchco-expressionofERVsandLINEsincanceroustissues[35,36],possiblythrough23

concordanthypomethylation[37].24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 15: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

15

TherewasoneclassofhostgenesthatwerefrequentlyfoundasmembersoftheTEco-1

expressionmodules,andtheyweretheKRABZincFingerProteins(KZFPs).Table22

showsthelistofKZFPsthatwereidentifiedasTEmodulemembers.3

ExpressionofimmunegenesarenegativelycorrelatedwithintergenicTE4expression.5

Onceweidentifiedmodulesconsistingmostlyoftransposonfamilies,wealsoexamined6

whetheranyco-expressionmoduleswerenegativelycorrelatedwithTEmodules.We7

foundtwomodules,M33andM35,thatshowedconsistentnegativecorrelationacross8

tissues.Thegenesincludedinthesemodulesweregenesinvolvedininnateimmune9

system,interferonsignaling,immunoproteasome,etc(Figure6).Figure6showsthe10

enrichedannotationtermsdetectedforbothmodulesthroughtheReactomedatabase,11

thetop30geneswithhighestmodulemembershipforthetwomodules,andthe12

correlationplotbetweenTEsintheTEmodulesM8,M21,M38andM45,andgenesin13

moduleM33andM35inthetissuesbreast(Figure6d)andesophagus/stomach(Figure14

6e).OnlygenesandTEsthatshowgreaterthan0.6Pearsoncorrelationwiththe15

representativeprofileofthemoduleinalltissueshavebeenincludedinthecorrelation16

plot.Weobservehighcorrelationwithingroupsandcontrastingnegativecorrelation17

betweengroups.18

Co-expressionanalysisincludingintronicTEsrevealsnegativecorrelation19betweenintronicTEexpressionandmitochondrialgeneexpression.20

WhenweincludeintronicTEtranscriptsintheoverallTEexpressionlevels,theco-21

expressionanalysisledtoadifferentpicturefromtheanalysisofintergenicTEs.When22

intronicTEsareincluded,asinglemodule,N1,emergesasthedominantTEmodule,23

containing612outof848TEfamilies(72%)thatwasassignedamodulemembership.24

Infact,N1consistsof72%ofallTEfamiliesbutonly2%ofallgenes.25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 16: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

16

Table3showsgenesthataresignificantlycorrelatedwithmoduleN1inmultiple1

tissues.Apatternimmediatelynoticeableisthattherearemanypseudogenes,intronic2

transcripts,antisenseRNAs,andlongnon-codingRNAsonthelist.Itlookslikewith3

intronicTEs,wearedetectingacellstatethatisdysregulatedinsplicingormRNA4

qualitycontrol,andasaresult,weareseeingaglobalelevationofpervasive5

transcriptionthatisgenerallynon-functional.Multipleproteincodinggenesonthelist6

areinvolvedinmRNAsplicingregulation,suchasNCRNA00201,anisoformofHNRNPU7

whichshowedstrongcorrelationwiththeintronicTEmoduleinsevendifferenttissues,8

aswellasCCNL2,LUC7andLUC7L3,perhapsasaresponsetothedysregulatedsplicing.9

AnotherinterestinggeneinthelistisNKTR,hintingatthepresenceofimmunecellsin10

thetissuesampleswithhighintronicTEexpression.Thisisincontrastwiththenegative11

correlationweobservewithimmunegenesandintergenicTEexpression.12

ThemodulethatwasnegativelycorrelatedwiththeintronicTEmodule(N1)included13

co-expressionclustersconsistingofmitochondrialproteinsandribosomalproteins14

(N4).N4wastheonlymodulethatwasconsistentlynegativelycorrelatedwithN1with15

lessthan-0.7correlationcoefficientacrossalltissues.Figure7showsthecorrelation16

plotsbetweenTEsinN1,andgenesinthemitochondrialgenemoduleN4,forbreastand17

esophagus.EnrichedannotationtermsforthegenesfoundintheReactomedatabase18

arecenteredaroundtranslationandmitochondria.Oneintriguingpossibilitymaybe19

thatthefailedsplicingandmRNAcontrolisleadingtoasuppressionoftranslationthat20

inturnleadstoreducedRNAlevelsofmitochondrialgenesandribosome.21

WeagainsawanenrichmentofKZFPsasmembersoftheintronicTEmoduleN1,and22

anothermoduleN10,thatwaspositivelycorrelatedwithN1(SupplementaryTable4).23

ThelistofKZFPshadsomeoverlapwiththeKZFPsco-expressedwithintergenicTE24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 17: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

17

modules,butthereweresomedifferencesaswell.Wecombinedthe22KZFPsin1

moduleN1andN10andexaminedwhethertherewereanycommontranscription2

factorbindingforthesegenesfoundintheENCODEChIP-seqdatawiththeEnrichR3

database[38].TheregionneartheseKZFPswereenrichedwithbindingofGABPA,a4

regulatorofnuclearencodedmitochondrialgenes,inmultiplecelllines(Supplementary5

Figure5).Thiswasinteresting,giventhenegativecorrelationobservedwithintronic6

transposonsandnuclearencodedmitochondrialgeneexpressiondescribedabove.7

Genesco-expressedwithL1HSincludegenesregulatingmajorsignaling8pathways,chromatin,andstressresponse.9

GiventheinterestintheactiveelementL1HS,andtheuncertaintyinL1HS10

quantification,wedecidedtolimitthequantificationtothe5’regionofL1HS,and11

examinethehostgenesthatarespecificallycorrelatedtotheexpressionof5’regionof12

L1HSwithoutregardtotheco-expressionmodules.Inordertocontrolforthe13

correlationwithotherTEs,especiallyintronicTEs,weincludedtherepresentative14

profileofN1asacovariateintoourlinearmodel.Oneconcernwithco-expression15

analysisispositionaloverlap.Therewere14genesthatoverlappedwiththeL1HSloci16

wewerecountingthereadsfrom.Only1ofthe14genes,RAB3GAP2,showedsignificant17

correlationwithL1HS5’,andwasremovedfromthefinallist.56geneswereidentified18

asnegativelycorrelated,and77geneswereidentifiedaspositivelycorrelatedwith19

L1HS5’inatleasttwotissues(Figure8,SupplementaryTable5).Notablegenesinclude20

RASA1,RASA2,RRAS,EGFRandMAPK1,intheRas-MAPKpathway,ECSIT,TAB3and21

TRAF6,regulatorsoftheNF-κBpathway,RNASEH2C,aknownL1HSrepressor[39],22

TET2,knowntobindtoanddemethylateyoungL1s[40],THAP7,ahistonetailbinding23

transcriptionrepressor[41],andDDI2,aproteasethatcleavesandactivates24

NFE2L1/NRF1[42].Multiplegenesintherespiratoryelectrontransportpathway,25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 18: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

18

ECSIT,NDUFA1,NDUFA8,NDUFB10,NDUFB8,SURF1,UQCR11,UQCRB,werenegatively1

correlatedwithL1HS5’,evenaftercontrollingforthecovariationwithintronicTEs,N1.2

WholelistofgenesarereportedinSupplementaryTable5.3

Wecheckedwhetherthelistofournegativelycorrelatedgeneswereoverlappingwith4

thegenesidentifiedthroughCRISPR–Cas9screen[21].Ofthe56negativelycorrelated5

genes,threegenes,RNASEH2C,HAUS7,RNF166werealsoonthelistof253secondary6

screenhits.Therewasnooverlapamongthe77positivelycorrelatedgenes.7

WealsocheckedwhetherthereweretranscriptionfactorsknowntobindtoL1HS8

sequence[43]inourlist.Ofthe77positivelycorrelatedgenes,fourgenes,YY1,REST,9

ELF1,ZBTB33wereidentifiedtobindtoL1HS[43].Therewasnooverlapamongthe5610

negativelycorrelatedgenes.Tocheckifthesametranscriptionfactorsareregulating11

thecorrelatedgenesandL1HS,wealsocheckedwhatkindofTFbindingisobservedin12

theupstreamofourcorrelatedgenes.TherewereafewenrichmentofENCODE13

transcriptionfactorbindingupstreamofourlistofcorrelatedgenes(Supplementary14

Figure6),butexceptforYY1,theenrichedTFsdidnotoverlapwiththelistofSunetal.15

[43]16

TEmoduleexpressioniscorrelatedwithradiationexposureinthyroidtissue.17

WeexaminedwhetheranyoftheclinicalvariableswereassociatedwiththeTEmodule18

expressionortheL1HSexpressionlevels.Wetestedthevariablesage,daystodeath,19

pathologicalstage,Tstaging,Nstaging,Mstaging,gender,radiationandraceforeach20

tissuetype.NovariablewasfoundtobeassociatedwithL1HS5’expression.Radiation21

therapywastheonlyclinicalvariableassociatedwithmoduleN1(intronicTEmodule)22

expressioninthenon-tumoroustissueofthyroid(p-val=0.00894,Supplementary23

Figure7).24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 19: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

19

Co-expressedTEsandKRAB-ZFPsshowlimitedoverlapwithChIP-seqbinding1

BasedonthepositivecorrelationobservedamongKZFPsandTEmodules,andexisting2

literatureontheroleofKZFPsforTErepression,wedecidedtoexaminethecorrelated3

expressionofallpairsof979TEfamiliesand366KZFPs.Themoststrikingpattern4

observedwasthatKZFPsandTEsshowoverwhelminglypositivecorrelationandlittle5

negativecorrelation.Chromosome19,wherethemajorityoftheKZFPsareclustered,is6

alsothechromosomewiththehighestdensityoftransposableelements.Thisunique7

structureofchromosome19mayleadtoTEsembeddedinKZFPgeneserroneously8

identifiedasco-expressed.Weavoidtheconfoundingeffectofpositionaloverlap9

betweenTEsandKZFPsbyonlycountingreadsmappingtoTEsthatareinthe10

intergenicregion1Kbawayfromanygenes.Theremayberesidualcorrelationdueto11

sharedgenomicenvironmentofalargerscale,suchasthechromatinstate.But,that12

doesn’texplainallthepositivecorrelation,because,whenwelookatthelocuslevel13

correlation,wefindthattheindividualTElocicorrelatedwiththeZNFsarescattered14

acrossallchromosomes,andnotnecessarilyenrichedonchromosome19.15

Theco-expressionbetweenKZFPsandTEswereobservedacrossalmostallTEfamilies,16

as794TEfamilieshadatleastoneco-expressedZFPinatleastonetissue.CertainZFPs,17

suchasZNF621,ZNF780B,ZNF84,ZNF33A,andZNF662,showedcorrelationwithawide18

rangeofTEfamiliesinmultipletissues.TE-KZFPpairs,HERVK14-int:ZNF814,MER57A-19

int:ZNF621,andMSTB-int:ZNF41werethemostfrequentpair-wiseco-expression20

observedbetweenTEfamiliesandKZFPs,foundpositivelycorrelatedinsixdifferent21

tissues.TheZFPsthatwerenegativelycorrelatedwithTEswereZNF511andZNF32,22

but,theyarenotclassifiedasKZFPsastheydonothaveaKRABdomain.23

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 20: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

20

Welookedatthefamilylevelco-expressionbetweenTEfamiliesandKZFPsandtested1

theoverlapagainsttheKZFPboundTEfamilyenrichmentreportedintheChIP-exo2

study(GSE78099[44]).Wefoundthatthereisastatisticallysignificantassociation3

betweenco-expressionandbinding(p-value<2.2e-16).But,thenumberofoverlapping4

pairswereverysmall.Figure9showstheoverlapbetweenco-expressionandbinding5

enrichment.Weonlymarktheco-expressionfoundinatleasttwotissues,andwehave6

omittedtheTE-KZFPcombinationsthathaveneitherco-expressionnorbinding7

enrichmentfromthefigure.Thetotalcombinationstestedthatoverlapbetweenthetwo8

datasetsis200,889(221KZFPx909TEfamilies).4138pairwiseco-expressionwas9

observedinatleasttwotissues.Ofthose,only119wasenrichedforbindingintheChIP-10

exostudy[44].11

Tocheckhowtheco-expressionisobservedatindividualTEloci,wetooktheTEfamily-12

KZFPpairsthatshowcorrelatedexpression,andfurthertestedco-expressionbetween13

individualTElociofthecorrelatedTEfamilyagainsttheKZFPofinterest.With14

correlationsatthelocuslevel,wewereabletoexaminethelocuslevelco-expression15

andcompareitdirectlytothebindingpeaksreportedinImbeaultetal[44].Ofthe625816

co-expressedTElociwheretheKZFPhadbeenassayedwithChIP-exo,therewereonly17

4thatwereboundbythesameKZFP.Wedonothaveagoodexplanationforwhythere18

isalackofoverlapbetweenco-expressionandbindingatthelocuslevel,whenthere19

wasatleastsomeamountofoverlapatthefamilylevel.Itlooksliketheco-expression20

weobserveisaresultofindirectinteractions,andnotnecessarilydirectbinding.21

Wealsoobservedthatatthelocuslevel,therewasnotalotofoverlapbetweentheTEs22

thatareboundbyKZFPsin[44],andtheTEsthatareexpressedintheTCGAnon-23

tumouroustissues,regardlessoftheco-expressionrelationshipwithKZFPs.Here24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 21: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

21

“expressed”meansthereisatleastonesampleinourdatawithmorethanfivereads1

mappedtotheTElocus,and“binding”meansthatthereisapeakdetectedinthe2

GSE78099ChIP-seqdataoverlappingwiththeTElocuswitha+-250bpbuffer.Figure103

showstheoverallbreakdownofthe4.5milliontransposonsannotatedinhg19UCSC4

Repeatmaskertrack.Statistically,thereismoreoverlapthanexpected(p-value<2.2e-5

16)betweenbindingandexpression,but,theoverallproportionofTEsthatareboth6

expressedinatleastonesampleandboundbyatleastoneZFPareatinyproportion7

(2.6%)ofallTEsinthegenome.8

Oneinterestingpatterndidemergewhenweexaminedtheoverlapwithepigenetic9

marksofcandidatecis-RegulatoryElementsdefinedintheENCODEdata[45].Wewere10

morelikelytoseeanenhancer-likemark(DNase+H3K27ac)forTElocithatare11

expressedcomparedtonon-expressedTEs,andweweremorelikelytoseeapromoter-12

likemark(DNase+H3K4me3)forZFPboundTElocicomparedtoTEswithnobinding13

(Figure10).The2.6%ofTElocithatareexpressedinatleastonesampleandboundby14

atleastoneZFPshowedthehighestproportionofbothpromoter-likemarksand15

enhancer-likemarks.WhenwedividetheTElociintogeneregions(genesincluding16

intronsand+-1Kflankingregion)andintergenicregions(1Kawayfromstartandend17

ofgenes),theoverallpatternremainedthesame,exceptthatTElociweretwiceaslikely18

tobeexpressediftheyareclosetogenescomparedtointergenicregions,andtheTE19

lociweretwiceaslikelytobeoverlappingwiththepromoter-likemarks20

(SupplementaryFigure8).Theenhancer-likemarksshowednodifferencebetween21

generegionsandintergenicregions,andtheCTCFmarksincreasedintheintergenic22

regions.23

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 22: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

22

Discussion1

Limitationstothequantificationandcorrection2

Quantifyingtransposontranscriptsisadifficultproblem,duetotheirambiguityinshort3

readmappingbecauseofrepeatedcontentinthereferencegenome.Currentstateofthe4

artmethodsrelyonExpectation-Maximizationtoaccountfortheuncertaintyinmulti-5

mappedreads[29].Focusingonlyonuniquelymappedreadsdoesn’treallysolvethis6

problem,andwillleadtobiasedquantification,favoringolderelementswithhigher7

mappability.Scottetal.havedemonstratedthatbyrelyingonuniquemutationsfound8

withinindividualL1HSloci,andbyincludingsequencesofnon-referencepolymorphic9

L1HSloci,itispossibletoidentifythesourceoftheL1HSactivitywithsubstantial10

success[46].But,inourstudywedidnotattempttoidentifytheindividuallociofL1HS11

transcription,andinsteadfocusedonthetotalityofreadsmappingtoregionsof12

annotatedL1HSthataligntothe5’endofL1HSconsensussequence.13

AnothercomplicationinTEtranscriptquantificationisthatTEsarefrequently14

embeddedwithinintronsthataretranscribedbeforetheyareprocessed,orsometimes15

failtobesplicedout,orembeddedwithinexonsornon-codingRNAsthatareexpressed16

indifferentconditions[28].Toaccountforthissourceoferror,weintroducedamethod17

tocorrectforTEreadscomingfromretainedintronsorpre-mRNA.Althoughwe18

observedlargecorrectionsforspecifictransposableelementsembeddedwithinintrons,19

thecorrectionisnotcomplete.Wecantellthisfromtheobservationthattheco-20

expressionprofilesofintronicTEsaredifferentfromtheco-expressionprofilesof21

intergenicTEsawayfromthegenes.Thegenesco-expressedwithintronicTEsinclude22

pseudogenes,intronictranscripts,anti-sensetranscriptsandgeneswithfunctionsin23

splicing.Amoreaccurateapproachwouldbetocorrectforthereadcountsfrom24

retainedintronsbeforetheEMalgorithmbasedonthereaddepthofuniquelymapped25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 23: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

23

reads,andthenrunEMbasedonthecorrectedcounts.But,estimatingthereaddepthof1

therepeatregionusinguniquelymappedreadsisadifficultproblem.Theeffective2

lengthoftheuniquelymappedregionisdifficulttoestimate,becauseagainmappability3

variesfromlocustolocusforanyTE,dependingontheuniquemutationsithas4

accumulated.So,forthisstudy,wedecidedtousetheeasierapproachtorunEMfirst,5

andprobabilisticallyassigntheTEreads,andthencorrectbasedontheexpectedread6

depthacrossthelengthoftheTElocus.Animportantfuturestudywouldbetostudythe7

mappabilityofindividualTElocicarefully,includingtheknownpolymorphicsites,and8

todesignasoftwareforTEquantificationthatcantakeintoaccountthemappabilityof9

eachlocusinitsEMalgorithm,aswellascorrectfortheretainedintronswhile10

consideringtheeffectivelengthoftheuniquelymappableregionwithintheTE.11

Despitetheselimits,themainresultsofco-expressionanalysiswerenotaffectedbythe12

quantification.Mostoftheresultsinthepaperwerereplicatedwhenquantificationwas13

doneonuniquelymappedreadsonly.Theonlyresultsthatchangedbetweenthemulti-14

mappedapproachvs.theuniquemappingapproachwerethegenescorrelatedwiththe15

L1HS5’expressionlevel.Forthose,wedecidedtoreportonresultsfromthemulti-16

mappedreadsratherthantheuniquereads,becauseofthebiasoftheuniquelymapped17

readswedescribedabove.18

Stress,immuneresponseandTEexpression19

Initially,whenwestartedtheproject,ourgoalwastoidentifycandidategenesinvolved20

intransposoncontrol,basedontheco-expressionanalysis.But,oncetheanalysiswas21

done,theresultswerepointingtowhatinducesTEexpression,ratherthanwhat22

suppressesTEexpression.Amongthegenesknowntofunctionintransposoncontrol,23

RNaseH2C(Figure8),HAUS7,andRNF166[21]showednegativecorrelationwithL1HS.24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 24: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

24

Butseveralwell-knowngeneswithfunctionsintransposoncontrol,e.g.MORC2,SIRT6,1

KAP1,SAMHD1,MOV10,ZAP,C12orf35(humanorthologofRESF1),etc.aremissingin2

ourlistofsignificantlycorrelatedtranscripts.Instead,themajorthemethatemerged3

fromourresultsissignaltransduction,immuneresponse,andstressresponseasseen4

inthecorrelationbetweenL1HSandDDI2,Ras-MAPKandNF-κBpathway.Inhumans,5

variousstresseshavebeenshowntoinduceLINE1transcriptionoractivationincluding6

chemicalcompounds[47–49],radiation[50,51],oxidativestress[52]andaging[53].7

MostofthesestudieshaveobservedL1activityinvitro,byexposingculturedcellsto8

stressfactorsandassayingtheretrotranspositionactivity.9

ThenegativecorrelationwefindbetweenTEexpressionandimmunegeneactivityhas10

beenreportedbeforeingastrointestinalcancersamples.Jungetal.haveshownthatthe11

L1retrotranspositionrateisinverselycorrelatedwithexpressionofimmunologic12

responsegenes[54].Here,weextendthoseresultsandshowthatthenegative13

correlationbetweenTEexpressionandimmuneresponseisapatternfoundinnon-14

tumoroussamplesaswell,acrossdifferenttissuesanddifferentclassesofTEs.This15

relationshipisconfusing,sinceitisoppositeofthepositivecorrelationwefindbetween16

L1HSandNF-κBpathwaygenes(Figure8),andoppositeofthepatternobservedin17

severalcancerstudies,whereDNAhypomethylationandexpressionofendogeneous18

retrovirusactivatesinterferonsignaling[55–57].Immuneactiveenvironment19

surroundingthesetumoradjacentcellsplusnucleicacidsintheextra-cellular20

environmentcomingfromcancernearbymaybeputtingthetumoradjacentcellsinan21

antiviralstate.Itisknownthatinterferonsignalinginducesproteinsthatactagainst22

viruses.ZAPisoneexamplethatdegradesviralRNAaswellasRNAofLINEsandAlus23

[58],althoughZAPdoesnotshowcorrelatedexpressionwiththeTEmodulesinour24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 25: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

25

data.Wehypothesizethatsuchcellstatesmayreducetransposontranscriptswith1

highersensitivitythroughRNAdegradationandchromatinremodeling.2

Thetissuesamplesinthisstudyarenotrepresentativeof“normal”cells,astheyare3

collectedascontrolsfromtissueadjacenttocancercells.Althoughtheyarenot4

undergoingthemolecularchangesassociatedwithmalignanttransformation,they5

couldbeundertheinfluenceofnearbyenvironment,withchangesinpHlevels,6

inflammation,andinfiltrationofimmunecells.TheinclusioncriteriaforTCGAdoesnot7

allowpatientswithanypriorsystemicchemotherapyoranyotherneoadjuvanttherapy,8

butitdoesallowlocalradiation,andweobservethatpastlocalradiationisassociated9

withhigherTEexpressionlevelsinadjacentcellsinthyroidtissues.Giventhe10

characteristicsofthesamples,thevariationinTEexpressionlevelsortheco-expression11

patternweobserveinthisstudymaybeduetocancer-associatedstress.Futurestudies12

willbeneededtoconfirmwhethertheresultsarereplicatedintruenormaltissue.13

TEsandKRAB-ZFPs14

ChIP-SeqstudiesonKRAB-ZFPshaveidentifiedextensivebindingbetweenthisfamilyof15

proteinsandtransposableelements[44,59],implyingaroleforsuppressingTE16

expression.KRABdomainisawell-knownrepressordomainandtogetherwiththeco-17

factorKAP1(TRIM28),theKZFP-KAP1complexhasbeenshowntosilenceboth18

exogenousretrovirusesandendogenousretroelementsduringembryonicdevelopment19

[60,61].Basedonthisobservation,andthepatternofco-evolutionofretroviralLTRs20

andtheC2H2-ZincFingergenefamily,ithasbeenhypothesizedthattheKRAB-ZFPs21

functionintransposableelementsuppression[62].ButexceptforafewKRAB-ZFPs,22

mostmembersdonothaveacharacterizedfunction.Inanalternativehypothesis,23

insteadofitsoriginalroleinsilencing,itwasproposedthatKRAB-ZFPsmayalsohavea24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 26: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

26

roleincontrollingdomesticatedtransposableelementsthatcontributetothehost1

transcriptionregulationnetwork[63].Inourco-expressionanalysis,wefound2

overwhelmingpositivecorrelationbetweenKZFPsandTEsacrossallclassesofTEs.3

Thispositivecorrelationwasobservedwhetherwearecountingmulti-mappedreadsor4

uniquelymappedreads,andwhetherwearecountingTEsclosetogenes,orTEsinthe5

intergenicregions.Despitethisrobustpositivecorrelation,wefoundthattheco-6

expressedrelationshipshowedlimitedcorrespondencewithpublishedChIP-seq7

bindingresults.Therewasstatisticallymeaningfulbutverysmallnumberofoverlapat8

thefamilylevel,andalmostnooverlapatthelocuslevel.Theco-expressionweobserve9

seemstobelargelyanindirectrelationship,andnotaresultofdirectbinding.There10

havebeenobservationsofauniquechromatinstatethatissharedbetweenZFPclusters11

andrepeatclassesbytheRoadmapEpigenomicsproject[64].Thischromatinstate,12

termedZNF/Rpts,ischaracterizedbyH3K36me3marksco-occuringwithH3K9me313

marksandhighDNAmethylation.Itispossiblethatlocalchromatinenvironmentthatis14

co-regulatedatalargerscaleisresponsibleforthecorrelationattheRNAlevel.15

Conclusions16

TEderivedtranscriptsinthenon-tumouroustissuesshowlargevariationacross17

tissues,andacrossindividuals.Co-expressionnetworkanalysiswithintissuesrevealed18

generalco-expressionofTEsacrossallclasses.Italsofoundstrongco-expression19

betweenTEsandKRAB-ZincFingerProteinsthatarereplicatedinmultipletissues,but20

notcongruentwithdirectbindingofTE-ZFPrelationshipsassayedthroughChIP-seq.21

WealsofoundnegativecorrelationbetweenintronicTEsandmitochondrialgenes,and22

betweenintergenicTEsandimmuneresponsegenes,replicatedinmultipletissues.23

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 27: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

27

Methods1

RNA-Seqandgeneexpressionquantificationinthenon-tumoroustissues.2

WeusedthegenelevelquantificationprovidedbyTheCancerGenomeAtlas(TCGA)for3

thegeneexpressions[24–26].Wecollectedgenelevelquantificationsfor697samples4

fromTCGA.Wefocusedoncancertypesthathadatleast10controlsamplesofRNA-seq5

data,collectedfromnon-tumoroustissueadjacenttothecancertissue.Asaresult,166

differenttissuetypeswereincludedinouranalysis:BLCA(Bladderurothelial7

carcinoma),BRCA(Breastcarcinoma),COAD(Colonadenocarcinoma),ESCA8

(Esophagealadenocarcinoma),HNSC(Headandnecksquamouscellcarcinoma),KICH9

(kidneychromophobe),KIRC(kidneyrenalclearcellcarcinoma),KIRP(Kidneyrenal10

papillarycellcarcinoma),LIHC(Liverhepatocellularcarcinoma),LUAD(Lung11

adenocarcinoma),LUSC(Lungsquamouscellcarcinoma),PRAD(Prostate12

adenocarcinoma),READ(Rectumadenocarcinoma),STAD(Stomachadenocarcinoma),13

THCA(Thyroidcarcinoma)andUCEC(UterineCorpusEndometrialCarcinoma).14

NumberofsamplesforeachtissueisdescribedinSupplementaryTable1.Althoughwe15

willusetheacronymforthecancertypetodescribethesetissues,weemphasizeagain16

thatalloursamplescomefromthenon-tumoroustissuescollectedfromthesameorgan17

ofthesamepatientwiththecancer.Thecancertissuesampleswerenotincludedinour18

analysis.19

MethodsforsequencinganddataprocessingofRNAusingtheRNA-seqprotocolforall20

tissuesexceptesophagusandstomachhavebeenpreviouslydescribedforTCGAin[24–21

26].Briefly,RNAwasextracted,preparedintopoly(A)enrichedIlluminaTruSeqmRNA22

libraries,sequencedbyIlluminaHiSeq2000(resultinginpaired48-ntreads),and23

subjectedtoqualitycontrol.Sequencingforesophagusandstomachwasdone24

differentlyfromothertissuesandhavebeendescribedin[27].Briefly,polyA+mRNA25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 28: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

28

waspurifiedusingMultiMACSmRNAisolationkitonMultiMACS96separator,and1

doublestrandedcDNAwassynthesizedusingtheSuperscriptDouble-StrandedcDNA2

synthesiskit.Followingthelibrarypreparationprotocoldescribedin[27],thefinalDNA3

wassequencedonIlluminaHiSeq2000withpairedend75-ntreads.RNAreadswere4

alignedtothehg19genomeassemblyusingMapsplice[65].Geneexpressionwas5

quantifiedforthetranscriptmodelscorrespondingtotheTCGAGAF2.1usingRSEM6

[31].Weusedtheraw_countvaluesinthe.rsem.genes.resultsfiles,roundedtoan7

integer,asthegenelevelquantification.8

QuantifyingTEderivedtranscriptsatthelocusandfamilylevel9

WecollectedRNA-seqlevel1binaryalignmentfiles(.bamfiles)for697samples10

(SupplementaryTable1)fromTCGA.Thebamfileswerethenconvertedtofastqand11

realignedtothehg19referencegenomeusingSTARandBowtie1.WiththeSTAR12

alignment,weallowedupto200mappingsforeveryread(--outFilterMultimapNmax13

200--winAnchorMultimapNmax200).WiththeBowtie1alignment,weonlyallowed14

thesinglebestalignmentforeachread,andifthereweremultiplebestalignments,the15

readwasdiscardedfromthefinalalignment(-m1-S-y-v3-X1000--max).Weuseda16

modifiedversionofthesoftwareTEtranscripts[29]forquantifyingthereadsmapping17

toannotatedtransposons.TEtranscriptsisasoftwarethatcanquantifybothgeneand18

TEtranscriptlevelsfromRNAseqexperiments.Ittakesintoaccounttheambiguously19

mappedTE-associatedreadsbyproportionallyassigningreadcountstothe20

correspondingTEfamiliesusinganExpectation-Maximizationalgorithm.We21

implementedtwomodificationtotheoriginalTEtranscriptssoftware.1)Wemodifiedit22

toreportreadcountsforeachindividualTElocusinthereferencegenomeinadditionto23

thefamilylevelcounts.2)Wedevelopedafunctiontodiscountthereadcountsby24

removingreadcountsthatcorrespondtotranscriptscontainingTEsequencesthat25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 29: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

29

originatefrompre-mRNAorretainedintronsinthematureRNA[28].Downstream1

analysesweredoneusingthediscountedquantificationbasedonmulti-mappedreads2

andtheuniquelymappedquantificationforboththeSTARalignmentandtheBowtie13

alignment,toassesstheimpactofuncertaintyinmulti-mappedreads.4

TheretrotransposonannotationsusedweregeneratedfromtheRepeatMaskertables,5

obtainedfromtheUCSCgenomedatabaseandprovidedbyTEtranscripts.For6

quantifyingreadsmappingtotheTEflankingintronswegeneratedgtffilescontaining7

1)theTEflankingintronpositions,2)theintergenicTEpositions,and3)theexonicTE8

positions(TEsthatfallwithinanexon,includingnon-codingRNAgenes).Incaseof9

intronicTEs,weusethealgorithmdescribedabovetodiscountthetranscriptsfrompre-10

mRNAorretainedintrons.IncaseofintergenicTEs,wecountallEMestimatedreads11

mappedtoTEswithoutanydiscount.IncaseofexonicTEs,weignorethosecounts12

altogether,andtheexonicTEsdonotcontributetothelocuscountnorthefamilylevel13

count.14

Normalizationandtransformationofreadcounts15

AfterquantifyingthereadsmappingtoannotatedgenesandTEs,boththegenelevel16

counts,andtheTEcountswerenormalizedbetweensamplesacrossalltissuetypes17

withDEseq2.Weusedthedefault"medianratiomethod"fornormalizationinDESeq218

[66].Briefly,thescalingfactorforeachsampleiscalculatedasmedianoftheratio,for19

eachgene,ofitsreadcounttoitsgeometricmeanacrossallsamples.Theassumptionof20

themedianratiomethodisthatmostgenesarenotconsistentlydifferentiallyexpressed21

betweentissues.Ifthereissystematicdifferenceinratiobetweensamples,themedian22

ratiowillcapturethesizerelationship.But,thisassumptionmaybeviolatedwhenwe23

arecomparinglargenumberoftissuestypesatthesametime,sincealargeproportion24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 30: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

30

ofthegenesmaybedifferentiallyexpressedinatleastonetissuetype,oroneofthe1

tissuesmaybeextremelybiasedintheirnumberofdifferentiallyexpressedgenes.In2

ordertoachievemorerobustnormalization,weusedatwo-stepnormalizationmethod3

calledthedifferentiallyexpressedgeneseliminationstrategy(DEGES)[67].We4

performedpreliminarynormalizationusingthe“medianratiomethod”,filteredout5

potentialdifferentiallyexpressedgenesinthedata,foundasubsetofrobustnon-6

differentiallyexpressedgenes,andusedthesubsettoperformthesecondroundof7

“medianrationormalization”.TheresultingpairwiseMAplotbetweentissuesafter8

normalizationshowedbetternormalizationcomparedtotheregularone-step9

normalization.Thesizefactorsforeachsampleobtainedfromthetwo-step10

normalizationongenecountswerethenusedtonormalizetheTEquantificationsofthe11

samesample.Thenormalizedcountswerelog2transformedusingthevariance12

stabilizingtransformationfunctioninDESeq2[66,68]fordownstreamanalysis.13

Clusteringofsamplesbyexpressionpattern.14

Weclusterthesamplesusingthe“average”method(=UPGMA)inthehclustfunctionof15

R,andvisualizetheclusterswiththeComplexHeatmappackage[69].Thetop150genes16

orTEs,withthelargestvariancesonthelog2transformedreadcountswereusedfor17

clustering.Wedidnotselectgenesbyanymeasureofdifferentialexpressionacross18

tissues.Thesegenesweresimplythegenesshowingthelargestvarianceinreadcount19

acrossall697samples,regardlessoftissuetype.WeexcludegenesandTEsonXandY20

chromosomes.Basedonthelog2readcountofthetop150TEs,adissimilaritymatrixis21

calculatedandusedfortheclusteringandvisualization.Theaveragemethodofhclust22

computesallpairwisedissimilaritiesbetweenthemembersofthetwoclustersand23

considerstheaverageasthedistancebetweenthetwoclusters.Hierarchicalclustering24

startswitheachsampleassignedtoitsownclusterandthenproceedsiteratively,at25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 31: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

31

eachstagejoiningthetwomostsimilarclusters,continuinguntilthereisjustasingle1

cluster.ForthelocuslevelTEexpression,wefilteredoutalllocithathadlessthan52

readcountsforeverysample.Tocomparetheclusteringofsamplesbasedongene3

expression,TEexpressionandrandomassignment,weusedthenormalizedMutual4

Information(NMI)measure[70].Thehierarchicalclusterswerecutoffatk=16,the5

numberofdifferenttissuetypes.Becausetheresultingclusterswerenotaccurate6

enoughtodistinguishbetweensimilartissues,weusedabroadertissuegroupingto7

comparewiththeclusters.Thetissuesweregroupedto10broadertypesbasedon8

preliminaryclustering:bladder/endometrium(BLCA,UCEC),breast(BRCA),liver9

(LIHC),colon/rectum(COAD,READ),esophagus/stomach(ESCA,STAD),headandneck10

—thesquamousepitheliuminthemucosalsurfacesinsidethemouth,nose,andthroat11

(HNSC),kidney(KICH,KIRC,KIRP),lung(LUAD,LUSC),prostate(PRAD),andthyroid12

(THCA).Thebroadertissuetypeofeachsamplewasusedasthegroundtruth.Each13

resultingclusterwasthenassignedagrouplabelbasedonthemajoritytissuetype.14

Normalizedmutualinformationwascalculatedbycomparingthelabelsfromthe15

clusteringtothetrueclasslabels.Randomassignmentclustersweregeneratedby16

permutingthetissuetypeswithandwithoutreplacement100times,andthemeanNMI17

wasreported.18

Co-expressionnetworkanalysiswithTEsandhostgenes19

WeightedcorrelationnetworkanalysiswasdonewiththeWGCNApackage[34].We20

startwiththesignedpair-wisecorrelationmatrixacrosstheexpressionlevels21

(normalizedlog2readcounts)ofallgenesandTEfamilies.Wecalculatetheadjacency22

matrixbyraisingthecorrelationmatrixtothepowerof14,powerparameterselected23

usingthescalefreetopologymeasure,effectivelysuppressingthelowcorrelationsdue24

tonoise.Topologicaloverlapbaseddistancematrix(TOM)iscalculatedusingthe25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 32: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

32

networktopologyresultingfromtheadjacencymatrix.Thisprocedurewasrepeatedfor1

eachtissue,andaconsensusTOMwascalculatedacrossalltissues.Weusedhierarchical2

clusteringonthisconsensustopologicaloverlapmatrixtoidentifyclusters(modules)3

thataresharedacrosstissues.Arepresentativegeneexpressionprofileofthemoduleis4

definedbythefirstprincipalcomponentoftheexpressionlevelsofallmembersineach5

module.Therepresentativeprofileiscomparedbetweeneachmoduletoidentify6

positiveandnegativecorrelationbetweenmodules.7

CorrelatedexpressionbetweengenesandL1HS5’8

WeblastedalltheL1HSinstancesannotatedinrepeatmaskeragainsttheL1HS9

consensussequenceandidentifiedtheregionsaligningtothe300basesofthe5’endof10

theconsensussequence.WecountedallthereadsmappingtothelistofL1HS5’ends11

andnormalizedthemwiththesamesizefactordescribedabove.Weusedlog212

transformedvalueofthisnormalizedreadcountasthevariablerepresentingL1HS13

transcriptlevel.CorrelationbetweengeneandL1HS5’transcriptsweretestedineach14

tissuegroupsseparately,inbladder,breast,liver,colon/rectum,stomach/esophagus,15

headandneck,kidney,lung,prostateandthyroid.Wetested20532genesforeach16

tissuegroupusingalinearmodelwithlog2L1HS5’expressionasthedependent17

variable,andlog2geneexpressionastheindependentvariable.Foragenetobe18

includedinourtest,ithadtobepresentinatleasteightindividualpatients.Wealso19

requiredthatthegenebeexpressedwithaminimumRPMof2in75%ofthesamplesto20

beincludedinthedataset.Inadditiontotheradiationtherapyforthyroidtissue,we21

consideredeffectivelibrarysize(sumofallnormalizedcounts)andthebatchID22

providedbytheTCGAprojectasadditionalcovariates.Sincetherewassignificantco-23

expressionacrossallTEclassesespeciallyfortheintronicTEs,weincludedthe24

expressionprofileoftheintronicTEmoduleN1identifiedduringtheco-expression25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 33: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

33

networkanalysisasacovariateinourlinearmodel.Thelinearmodelweusedis1

describedbelow.2

logJ 𝐿1𝐻𝑆~logJ 𝑔𝑒𝑛𝑒 + logJ 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒_𝑙𝑖𝑏𝑟𝑎𝑟𝑦_𝑠𝑖𝑧𝑒 + 𝑏𝑎𝑡𝑐ℎ + 𝑟𝑎𝑑𝑖𝑎𝑡𝑖𝑜𝑛3+ logJ 𝑁1𝑝𝑟𝑜𝑓𝑖𝑙𝑒4

Wetestedallcombinationoflinearmodelsthatcanbecreatedbyincludingorexcluding5

thesevariables.Second-orderAkaikeInformationCriterion(AICc)wasusedtoselectthe6

bestlinearmodel.Weusedthecoefficientandp-valuefromthebestmodeltocalculate7

theq-values.Geneswithq-value<0.0001inatleasttwotissueswereidentifiedas8

correlatedgenes.9

CorrelationbetweenTEandKRAB-ZFPs10

TounderstandthepositivecorrelationbetweenTEsandKRAB-ZFPs,welookedatthe11

correlationbetweeneachKZFPsandTEsatthefamilylevelandattheindividualTE12

locuslevelindifferenttissuetypes.Wetestedthecorrelationfor366KRABZincFinger13

ProteinsthatwereidentifiedinImbeaultetal.[44]andalsofoundinourgene14

expressiondata.BecausethesearchspaceofpairwisecombinationsofKZFPand15

individualTElociwastoolarge,weexaminedtherelationshipinastep-wiseapproach.16

Inthefirststep,wetestedthecorrelationbetweenallpairwisecombinationsof36617

KZFPsand979TEsubfamiliesusingtheTEquantificationatthefamilylevelineach18

tissuetype.Then,inthesecondstep,oncethesignificantlycorrelatedKZFPandTE19

familywasidentified,wefocusedonthosepairs.Wetestedthecorrelationbetweenthe20

expressionofthesignificantKZFPandtheexpressionofeachindividuallocusofthe21

significantTEfamilyinthetissuewheretheinitialco-expressionwasfoundtoidentify22

individualTElocithatareco-expressedwiththeKZFP.23

Overlapbetweenco-expressionandbindingwasexaminedatthefamilylevelandatthe24

locuslevel.Atthefamilylevel,wedownloadedthefamilyenrichmentresultsfrom25

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 34: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

34

Imbeaultetal.[44]andidentifiedpairsofTEfamiliesandKZFPthathadanenrichment1

scoregreaterthan1.Wecomparedthosefamiliesenrichedwithbindingofspecific2

KZFPstoourco-expressionresults,tocheckiftheTEfamilieswereco-expressedwith3

thesameKZFPs.Atthelocuslevel,wecomparedtheco-expressedTElociwiththe4

bindingpeaksreportedinthedatasetGSE78099.Wetook+-250bparoundthe5

boundaryofpeaksandfoundoverlapwithTEannotationsfromRepeatmasker.We6

checkediftheTElocusoverlappingwithChIP-seqpeakswerefoundtobeco-expressed7

withanyKZFPs.8

ListofAbbreviations9

TE:TransposableElement10

KZFP:KRABZincFingerProtein11

TCGA:TheCancerGenomeAtlas12

13

Declaration14

Ethicsapprovalandconsenttoparticipate15

Notapplicable.16

Consentforpublication17

Notapplicable18

Availabilityofdataandmaterial19

Thedatasetsgeneratedand/oranalysedduringthecurrentstudyareavailableinthe20

githubrepository,https://github.com/HanLabUNLV/TEcoex.Themodifiedversionofthe21

TEtranscriptssoftware[29]andtherequiredgtffilescanbefoundat22

https://github.com/HanLabUNLV/tetoolkit.23

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 35: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

35

Competinginterests1

Theauthorsdeclarethattheyhavenocompetinginterests2

Funding3

ThisworkwassupportedbytheNationalInstitutesofHealth[R15GM116108,4

P20GM121325toM.V.H.],andbytheNationalScienceFoundation[1750532toM.V.H].5

Authors'contributions6

NCperformedtheco-expressionanalysis.GMJperformedTEtranscriptsquantification.7

SQperformedtheRECscoreanalysis.NC,ARanalyzedtheKZFP-TElocuspairwiseco-8

expressionandKZFPmotifsearch.CSre-ranthepipelinewiththeBowtieandSTAR9

alignmentprogram.AA,CC,DC,ONassistedwiththeanalyses.MVHdesignedthe10

experiments,modifiedtheTEtranscriptssoftware,analyzedandinterpretedthedata.11

MVHwrotethemanuscriptwiththehelpofallotherauthors.Allauthorsreadand12

approvedthefinalmanuscript.13

Acknowledgements14

WethanktheTCGAandGTExprojectteamsformakingthedataavailable.15

16

References171.SlotkinRK.ThecasefornotmaskingawayrepetitiveDNA.MobDNA.2018;9:15.182.BranciforteD,MartinSL.DevelopmentalandcelltypespecificityofLINE-1expression19inmousetestis:implicationsfortransposition.MolCellBiol.1994;14:2584–92.203.TreloganSA,MartinSL.Tightlyregulated,developmentallyspecificexpressionofthe21firstopenreadingframefromLINE-1duringmouseembryogenesis.ProcNatlAcadSci.221995;92:1520–4.234.ErgünS,BuschmannC,HeukeshovenJ,DammannK,SchniedersF,LaukeH,etal.Cell24Type-specificExpressionofLINE-1OpenReadingFrames1and2inFetalandAdult25HumanTissues.JBiolChem.2004;279:27753–63.265.KuboS,SelemeMdelC,SoiferHS,PerezJLG,MoranJV,KazazianHH,etal.L127retrotranspositioninnondividingandprimaryhumansomaticcells.ProcNatlAcadSci.282006;103:8036–41.29

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 36: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

36

6.BelancioVP,Roy-EngelAM,PochampallyRR,DeiningerP.Somaticexpressionof1LINE-1elementsinhumantissues.NucleicAcidsRes.2010;38:3909–22.27.RangwalaSH,ZhangL,KazazianHH.ManyLINE1elementscontributetothe3transcriptomeofhumansomaticcells.GenomeBiol.2009;10:R100.48.SkowronskiJ,SingerMF.ExpressionofacytoplasmicLINE-1transcriptisregulatedin5ahumanteratocarcinomacellline.ProcNatlAcadSciUSA.1985;82:6050–4.69.BratthauerGL,CardiffRD,FanningTG.ExpressionofLINE-1retrotransposonsin7humanbreastcancer.Cancer.1994;73:2333–6.810.RodićN,SharmaR,SharmaR,ZampellaJ,DaiL,TaylorMS,etal.LongInterspersed9Element-1ProteinExpressionIsaHallmarkofManyHumanCancers.AmJPathol.102014;184:1280–6.1111.BratthauerGL,FanningTG.ActiveLINE-1retrotransposonsinhumantesticular12cancer.Oncogene.1992;7:507–10.1312.PhilippeC,Vargas-LandinDB,DoucetAJ,EssenDvan,Vera-OtarolaJ,KuciakM,etal.14ActivationofindividualL1retrotransposoninstancesisrestrictedtocell-type15dependentpermissiveloci.eLife.2016;5:e13926.1613.MuotriAR,ChuVT,MarchettoMCN,DengW,MoranJV,GageFH.Somaticmosaicism17inneuronalprecursorcellsmediatedbyL1retrotransposition.Nature.2005;435:903.1814.FaulknerGJ,KimuraY,DaubCO,WaniS,PlessyC,IrvineKM,etal.Theregulated19retrotransposontranscriptomeofmammaliancells.NatGenet.2009;41:563–71.2015.DjebaliS,DavisCA,MerkelA,DobinA,LassmannT,MortazaviA,etal.Landscapeof21transcriptioninhumancells.Nature.2012;489:101–8.2216.ChuongEB,EldeNC,FeschotteC.Regulatoryactivitiesoftransposableelements:23fromconflictstobenefits.NatRevGenet.2016;18:71.2417.SundaramV,ChengY,MaZ,LiD,XingX,EdgeP,etal.Widespreadcontributionof25transposableelementstotheinnovationofgeneregulatorynetworks.GenomeRes.262014;24:1963–76.2718.KapustaA,KronenbergZ,LynchVJ,ZhuoX,RamsayL,BourqueG,etal.28TransposableElementsAreMajorContributorstotheOrigin,Diversification,and29RegulationofVertebrateLongNoncodingRNAs.PLOSGenet.2013;9:e1003470.3019.JachowiczJW,BingX,PontabryJ,BoškovićA,RandoOJ,Torres-PadillaM-E.LINE-131activationafterfertilizationregulatesglobalchromatinaccessibilityintheearlymouse32embryo.NatGenet.2017;49:1502.3320.PerchardeM,LinC-J,YinY,GuanJ,PeixotoGA,Bulut-KarsliogluA,etal.ALINE1-34NucleolinPartnershipRegulatesEarlyDevelopmentandESCIdentity.Cell.352018;174:391-405.e19.3621.LiuN,LeeCH,SwigutT,GrowE,GuB,BassikMC,etal.Selectivesilencingof37euchromaticL1srevealedbygenome-widescreensforL1regulators.Nature.382017;553:228.3922.TaylorMS,AltukhovI,MolloyKR,MitaP,JiangH,AdneyEM,etal.Dissectionof40affinitycapturedLINE-1macromolecularcomplexes.eLife.2018;7:e30094.4123.MitaP,WudzinskaA,SunX,AndradeJ,NayakS,KahlerDJ,etal.LINE-1protein42localizationandfunctionaldynamicsduringthecellcycle.eLife.2018;7:e30058.4324.TheCancerGenomeAtlasNetwork.Comprehensivemolecularcharacterizationof44humancolonandrectalcancer.Nature.2012;487:330.4525.TheCancerGenomeAtlasNetwork.Comprehensivemolecularportraitsofhuman46breasttumours.Nature.2012;490:61.4726.TheCancerGenomeAtlasResearchNetwork.Comprehensivegenomic48characterizationofsquamouscelllungcancers.Nature.2012;489:519.49

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 37: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

37

27.TheCancerGenomeAtlasResearchNetwork,BassAJ,ThorssonV,ShmulevichI,1ReynoldsSM,MillerM,etal.Comprehensivemolecularcharacterizationofgastric2adenocarcinoma.Nature.2014;513:202.328.DeiningerP,MoralesME,WhiteTB,BaddooM,HedgesDJ,ServantG,etal.A4comprehensiveapproachtoexpressionofL1loci.NucleicAcidsRes.2017;45:e31–e31.529.JinY,TamOH,PaniaguaE,HammellM.TEtranscripts:apackageforincluding6transposableelementsindifferentialexpressionanalysisofRNA-seqdatasets.7BioinformaOxfEngl.2015;31:3593–9.830.BrittenRJ.Mobileelementsinsertedinthedistantpasthavetakenonimportant9functions.Gene.1997;205:177–82.1031.LiB,DeweyCN.RSEM:accuratetranscriptquantificationfromRNA-Seqdatawithor11withoutareferencegenome.BMCBioinformatics.2011;12:323.1232.YangWR,ArdeljanD,PacynaCN,PayerLM,BurnsKH.SQuIRErevealslocus-specific13regulationofinterspersedrepeatexpression.NucleicAcidsRes.2019;47:e27–e27.1433.Doucet-O’HareTT,RodićN,SharmaR,DarbariI,AbrilG,ChoiJA,etal.LINE-115expressionandretrotranspositioninBarrett’sesophagusandesophagealcarcinoma.16ProcNatlAcadSci.2015;112:E4894.1734.LangfelderP,HorvathS.WGCNA:anRpackageforweightedcorrelationnetwork18analysis.BMCBioinformatics.2008;9:559.1935.DesaiN,SajedD,AroraKS,SolovyovA,RajurkarM,BledsoeJR,etal.Diverse20repetitiveelementRNAexpressiondefinesepigeneticandimmunologicfeaturesof21coloncancer.JCIInsight.2017;2:e91078.2236.SolovyovA,VabretN,AroraKS,SnyderA,FuntSA,BajorinDF,etal.GlobalCancer23TranscriptomeQuantifiesRepeatElementPolarizationbetweenImmunotherapy24ResponsiveandTCellSuppressiveClasses.CellRep.2018;23:512–21.2537.MenendezL,BenignoBB,McDonaldJF.L1andHERV-Wretrotransposonsare26hypomethylatedinhumanovariancarcinomas.MolCancer.2004;3:12.2738.KuleshovMV,JonesMR,RouillardAD,FernandezNF,DuanQ,WangZ,etal.Enrichr:28acomprehensivegenesetenrichmentanalysiswebserver2016update.NucleicAcids29Res.2016;44:W90–7.3039.ChoiJ,HwangS-Y,AhnK.InterplaybetweenRNASEH2andMOV10controlsLINE-131retrotransposition.NucleicAcidsRes.2018;46:1912–26.3240.delaRicaL,DenizÖ,ChengKCL,ToddCD,CruzC,HouseleyJ,etal.TET-dependent33regulationofretrotransposableelementsinmouseembryonicstemcells.GenomeBiol.342016;17:234.3541.MacfarlanT,KutneyS,AltmanB,MontrossR,YuJ,ChakravartiD.HumanTHAP7Isa36Chromatin-associated,HistoneTail-bindingProteinThatRepressesTranscriptionvia37RecruitmentofHDAC3andNuclearHormoneReceptorCorepressor.JBiolChem.382005;280:7346–58.3942.KoizumiS,IrieT,HirayamaS,SakuraiY,YashirodaH,NaguroI,etal.Theaspartyl40proteaseDDI2activatesNrf1tocompensateforproteasomedysfunction.DikicI,editor.41eLife.2016;5:e18357.4243.SunX,WangX,TangZ,GrivainisM,KahlerD,YunC,etal.Transcriptionfactor43profilingrevealsmolecularchoreographyandkeyregulatorsofhumanretrotransposon44expression.ProcNatlAcadSci.2018;115:E5526.4544.ImbeaultM,HelleboidP-Y,TronoD.KRABzinc-fingerproteinscontributetothe46evolutionofgeneregulatorynetworks.Nature.2017;543:550–4.47

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 38: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

38

45.TheENCODEProjectConsortium,DunhamI,KundajeA,AldredSF,CollinsPJ,Davis1CA,etal.AnintegratedencyclopediaofDNAelementsinthehumangenome.Nature.22012;489:57.346.ScottEC,GardnerEJ,MasoodA,ChuangNT,VertinoPM,DevineSE.AhotL14retrotransposonevadessomaticrepressionandinitiateshumancolorectalcancer.5GenomeRes.2016;26:745–55.647.OkudairaN,IijimaK,KoyamaT,MinemotoY,KanoS,MimoriA,etal.Inductionof7longinterspersednucleotideelement-1(L1)retrotranspositionby6-formylindolo[3,2-8b]carbazole(FICZ),atryptophanphotoproduct.ProcNatlAcadSci.2010;107:18487–918492.1048.StribinskisV,RamosKS.ActivationofHumanLongInterspersedNuclearElement111RetrotranspositionbyBenzo(a)pyrene,anUbiquitousEnvironmentalCarcinogen.12CancerRes.2006;66:2616–2620.1349.TerasakiN,GoodierJL,CheungLE,WangYJ,KajikawaM,KazazianHHJr,etal.In14VitroScreeningforCompoundsThatEnhanceHumanL1Mobilization.PLOSONE.152013;8:e74629.1650.Banaz-YaşarF,GedikN,KarahanS,Diaz-CarballoD,BongartzBM,ErgünS.LINE-117RetrotranspositionEventsRegulateGeneExpressionAfterX-RayIrradiation.DNACell18Biol.2012;31:1458–67.1951.FarkashEA,KaoGD,HormanSR,PrakETL.Gammaradiationincreases20endonuclease-dependentL1retrotranspositioninaculturedcellassay.NucleicAcids21Res.2006;34:1196–204.2252.GiorgiG,MarcantonioP,DelReB.LINE-1retrotranspositioninhuman23neuroblastomacellsisaffectedbyoxidativestress.CellTissueRes.2011;346:383–91.2453.VanMeterM,KashyapM,RezazadehS,GenevaAJ,MorelloTD,SeluanovA,etal.25SIRT6repressesLINE1retrotransposonsbyribosylatingKAP1butthisrepressionfails26withstressandage.NatCommun.2014;5:5011.2754.JungH,ChoiJK,LeeEA.ImmunesignaturescorrelatewithL1retrotranspositionin28gastrointestinalcancers.GenomeRes[Internet].2018;Availablefrom:29http://genome.cshlp.org/content/early/2018/07/03/gr.231837.117.abstract3055.ChiappinelliKB,StrisselPL,DesrichardA,LiH,HenkeC,AkmanB,etal.Inhibiting31DNAMethylationCausesanInterferonResponseinCancerviadsRNAIncluding32EndogenousRetroviruses.Cell.2015;162:974–86.3356.RouloisD,LooYauH,SinghaniaR,WangY,DaneshA,ShenSY,etal.DNA-34DemethylatingAgentsTargetColorectalCancerCellsbyInducingViralMimicryby35EndogenousTranscripts.Cell.2015;162:961–73.3657.HaffnerMC,TaheriD,Luidy-ImadaE,PalsgroveDN,EichM-L,NettoGJ,etal.37Hypomethylation,endogenousretrovirusexpression,andinterferonsignalingin38testiculargermcelltumors.ProcNatlAcadSci.2018;115:E8580.3958.MoldovanJB,MoranJV.TheZinc-FingerAntiviralProteinZAPInhibitsLINEandAlu40Retrotransposition.PLOSGenet.2015;11:e1005121.4159.NajafabadiHS,MnaimnehS,SchmitgesFW,GartonM,LamKN,YangA,etal.C2H242zincfingerproteinsgreatlyexpandthehumanregulatorylexicon.NatBiotechnol.432015;33:555–62.4460.WolfD,GoffSP.EmbryonicstemcellsuseZFP809tosilenceretroviralDNAs.Nature.452009;458:1201–4.4661.JacobsFMJ,GreenbergD,NguyenN,HaeusslerM,EwingAD,KatzmanS,etal.An47evolutionaryarmsracebetweenKRABzinc-fingergenesZNF91/93andSVA/L148retrotransposons.Nature.2014;516:242–5.49

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 39: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

39

62.RoweHM,TronoD.Dynamiccontrolofendogenousretrovirusesduring1development.Virology.2011;411:273–87.263.TronoD.TransposableElements,PolydactylProteins,andtheGenesisofHuman-3SpecificTranscriptionNetworks.ColdSpringHarbSympQuantBiol.2015;80:281–8.464.RoadmapEpigenomicsConsortium,KundajeA,MeulemanW,ErnstJ,BilenkyM,Yen5A,etal.Integrativeanalysisof111referencehumanepigenomes.Nature.2015;518:317.665.WangK,SinghD,ZengZ,ColemanSJ,HuangY,SavichGL,etal.MapSplice:Accurate7mappingofRNA-seqreadsforsplicejunctiondiscovery.NucleicAcidsRes.82010;38:e178–e178.966.AndersS,HuberW.Differentialexpressionanalysisforsequencecountdata.10GenomeBiol.2010;11:R106.1167.KadotaK,NishiyamaT,ShimizuK.Anormalizationstrategyforcomparingtagcount12data.AlgorithmsMolBiol.2012;7:5.1368.HuberW,vonHA,SueltmannH,PoustkaA,VingronM.Parameterestimationforthe14calibrationandvariancestabilizationofmicroarraydata.StatApplGenetMolBiol.152003;2:Article3.1669.GuZ,EilsR,SchlesnerM.Complexheatmapsrevealpatternsandcorrelationsin17multidimensionalgenomicdata.Bioinformatics.2016;32:2847–9.1870.CoverTM,ThomasJA.ElementsofInformationTheory.NewYork:Wiley&Sons;191991.2071.CaracausiM,PiovesanA,AntonarosF,StrippoliP,VitaleL,PelleriMC.Systematic21identificationofhumanhousekeepinggenespossiblyusefulasreferencesingene22expressionstudies.MolMedRep.2017;16:2397–410.2372.EisenbergE,LevanonEY.Humanhousekeepinggenes,revisited.TrendsGenetTIG.242013;29:569–74.25

26 27

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 40: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

40

Figures1

Figure1.ComparisonofreadalignmentonfulllengthL1HSwithmulti-2mappinganduniquemapping.3

ReadsmappedtotwodifferentfulllengthL1HSelementsfromastomachtissuesample4

(A4GY)visualizedthroughIGV.a.L1HS_dup967,a5799ntlengthelementon5

chromosomeX:75453754-75459553.b.a6225ntlengthelementL1HS_dup924on6

chromosomeX:11953208-11959433.Chromosomallocations,48basemappability7

calculatedwithGEM,Bowtie1alignmentallowingonlyuniquelymappedreadswith8

singlebesthit,STARalignmentallowingmulti-mappedreadsupto200mappingfor9

eachread,geneannotationandRepeatmaskerTEannotationareshownfromtopto10

bottom.RedlinesmarktheboundaryoftheL1HSelementswith5’and3’noted.11

12

Figure2.RelationshipbetweenreadcountsforeachL1HSlocus,andthe13mappabilityofthelocus.14

Log2transformedtotalreadcountsmappedtoeachL1HSlocusinthehg19genome,15

summedacrossallsamplesinourdatasetareplottedagainstthetotaluniquely16

mappablepositions(numberofpositionswithmappabilityscore=1basedon48bp17

mappabilitycalculatedwithGEM)foreachlocus.L1HSlociwithzeroreadcountsare18

markedat-1insteadof–infinity.a.readcountsfromBowtie1alignmentwithuniquely19

mappedreadsonlyb.readcountsfromSTARalignmentallowingmulti-mappedreads20

upto200.c.comparisonofreadcountsforeachL1HSlocusbetweentheuniquely21

mappedreadsandmulti-mappedreads.22

23

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 41: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

41

Figure3.Tissueclusteringbasedontransposableelements1

Heatmapshowingthetissueclusteringresultsbasedonthetop150TEswiththelargest2

varianceofeachclass.Withthecolorbarsatthetopoftheheatmap,theuppercolor3

labelsshowtissuetypesandthelowercolorlabelsshowbroadertissuegroupings.a-d.4

clusteringbasedonfamilylevelquantification.e-h.clusteringbasedonlocuslevel5

quantification.i.ClusteringqualitymeasuredbyMutualInformationforclustering6

resultsbasedonfamilylevelTEquantification,locuslevelquantificationforTEs100Kb7

awayfromgenes,locuslevelquantificationforTEs1Kbawayfromgenes,andlocus8

levelTEquantificationwithoutfilteringforgeneproximity.9

10

Figure4.representativetransposableelementlocithatshowtissue-specific11expression12

15TElociwithtissue-specificexpressionwasidentifiedasrepresentativeexamples13

amongintergenicTEsthatare100Kbawayfromstartandendofgenes.Heatmapcolor14

reflectsthez-scoreofnormalizedlog2readcountsacrosssamples.15

16

Figure5.VariationinTEexpressionacrosstissuesandindividuals17

a.Normalizedandlog-transformedsumofallreadcountsmappingtotheTEsofeach18

class,LINE,DNA,SINEandLTRareshownasviolinplots.Meanreadcountofasetof19

housekeepinggenes(SupplementaryTable6)areplottedasareference.Horizontalline20

acrosstheviolinplotrepresentsthemedianvalueacrossallsamplesofthattissuetype.21

b.Normalizedandlog-transformedsumofallreadcountsmappingto300bpatthe5’22

endofL1HSareplottedasaviolinplot.Samesetofhousekeepinggenesusedina.are23

plottedasareference.24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 42: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

42

Figure6.co-expressionmoduleswithimmunegenesarenegatively1correlatedwithTEmodules.2

a.enrichedannotationsforgenesbelongingtomoduleM33identifiedintheReactome3

database.b.enrichedannotationsforgenesbelongingtomoduleM35identifiedinthe4

Reactomedatabase.c.top30geneswithhighestmodulemembership(highcorrelation5

withrepresentativeprofileofthemodule)foreachmodule.d-e.correlationplot6

showinghighwithingroupcorrelationandnegativebetweengroupcorrelation7

betweentheTEsintheTEmodulesandtheimmunegenesinmodulesM33andM35.d.8

datafrombreast,ande.datafromesophagus.Colorlabelontopofthecorrelationplot9

showdifferentclassesofTEs,andgenesthatareannotatedwiththeGOterm“immune10

systemprocess”.11

Figure7.co-expressionmoduleswithmitochondrialgenesandribosome12genesarenegativelycorrelatedwithintronicTEs.13

a.enrichedannotationsforgenesbelongingtomoduleN4identifiedintheReactome14

database.b-c.correlationplotshowinghighwithingroupcorrelationandnegative15

betweengroupcorrelationbetweentheTEsintheintronicTEmoduleN1,andthe16

mitochondrialandribosomalgenesinmodulesN4.b.showsdatafrombreast,andc.17

showsdatafromesophagus.Colorlabelontopofthecorrelationplotshowdifferent18

classesofTEs,andgenesthatareannotatedwiththeGOterm“mitochondrion”and19

“ribosome”.20

Figure8.Genesthatshowpositiveandnegativecorrelationwiththe21transcriptlevelofL1HS5’end.22

GenethatshowsignificantpositiveandnegativecorrelationwithL1HS5’inmultiple23

tissues.Esophagusandstomacharecombinedasonetissuegroup.24

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 43: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

43

Figure9.Overlapbetweenco-expressedKZFP-TEfamilypairsandTEfamilies1enrichedforKZFPbinding.2

a.KZFP-TEfamiliesco-expressedinatleasttwotissuesaremarkedwithpink,TE3

familiesenrichedforKZFPbindingaremarkedwithyellow,andTEfamiliesthatare4

bothboundbyKZFPandco-expressedwithsameKZFParemarkedwithgreen.KZFPs5

thatshowoverlapofbindingandco-expressionformultipleTEfamiliesarelabeled6

alongtheverticalaxis.b-c.categorizationofco-expressionandKZFPbindingforall7

200,889KZFP-TEfamilypair-wisecombinations(221KZFPx909TEfamilies),that8

havebothexpressionandChIP-exodata.b.countsco-expressionsignificantinatleast9

onetissue.c.countsco-expressionsignificantinatleasttwotissues.10

11

Figure10.OverlapbetweenexpressionandKZFPbindingforTEloci12

a.categorizationofall4,496,028TElociannotatedinhg19RepeatMaskerbyexpression13

andKZFPbinding.b.OverlapofexpressionandKZFPbindingwithENCODECandidate14

RegulatoryElementmarks.c.ProportionofeachcategoryofTEsthataremarkedwith15

ENCODECandidateRegulatoryElementmarks.16

17

18

19

20

21

22

23

24

25

26

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 44: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

44

Tables1

Table1.Transposonlocithatshowlargedifferenceaftercorrectingforpre-2mRNA/retainedintrons.3

a.Transposonlociembeddedwithinintronsorexonsofgenesthatfrequentlyresultin4

thelargestcorrectionineachsample.Locusid,genomiclocation,surroundinggeneand5

structuretheTEisembeddedin,andthemaximumnumberofreadsremovedina6

sample.7

locus chr start end surrounding gene

TE embedded in

# of samples

Max correction

MIRc_dup47590 8 22021288 22021431 SFTPC Intron 4, Exon 5 105 428021

AluY_dup80589 12 69747275 69747567 LYZ Exon 4 62 313317

MIRb_dup137684 10 81315669 81315913 SFTPA2 Exon 5 106 266566

MIRb_dup137689 10 81374907 81375150 SFTPA1 Exon 5 106 230394

MIR3_dup57107 10 81316603 81316678 SFTPA2 Exon 5 103 90241

AluSz6_dup3320 1 207102295 207102608 PIGR Exon 11 124 59130

MIRc_dup74805 12 50351953 50352157 AQP2 Exon 4 109 57581

LTR39_dup404 6 160102172 160102969 SOD2 Exon 4, Intron 7 119 34545

AluSx1_dup59209 Y 21153222 21153521 TTTY14 Exon 1 305 25867

AluJb_dup119100 17 16344881 16345132 C17orf76-AS1 Intron 4, Exon 5 253 21915

8 9

10

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 45: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

45

Table2.KZFPgenemembersincoreTEmodules1

KZFPgenesthataremembersofthecoreTEmodules,andmoduleM3thatiscorrelated2

withcoreTEmodules.3

4

core TE modules KZFP chromosome KZFPs in correlated module M3 chromosome

M8

HKR1 19 KZFP169 9 KZFP226 19 KZFP202 11 KZFP682 19 KZFP266 19 KZFP789 7 KZFP300 5 KZFP814 19 KZFP320 19

M21

KZFP404 19 KZFP431 19 KZFP418 19 KZFP439 19 KZFP589 3 KZFP44 19 KZFP75A 19 KZFP587 19

M38 KZFP117 7 KZFP662 3

M45

KZFP334 20 KZFP7 8 KZFP493 19 KZFP700 19 KZFP506 19 KZFP708 19 KZFP721 4 KZFP714 19 KZFP737 19 KZFP732 4

KZFP83 19 KZFP841 19

5

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 46: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

46

Table3.GenemembersintheintronicTEmoduleN1.1

GenesthataremembersoftheintronicTEmoduleN1,thenumberoftissuestheyare2

assignedtomoduleN1in,synonymstogenenames,fullgenenames.3

Gene Symbol tissues Synonyms Full gene name NCRNA00201 7 HNRNPU heterogeneous nuclear ribonucleoprotein U AHSA2 6 AHSA2P activator of HSP90 ATPase homolog 2, pseudogene CCNL2 6 CCNL2 cyclin L2 CG030 5 N4BP2L2-IT2 N4BPL2 intronic transcript 2 FAM13AOS 5 FAM13A-AS1 FAM13A antisense RNA 1 MDM4 5 MDM4 MDM4 regulator of p53 NKTR 5 NKTR natural killer cell triggering receptor SLC25A27 5 UCP4 solute carrier family 25 member 27 ANKRD36 4 ANKRD36 ankyrin repeat domain 36 LOC100190986 4 LOC100190986 uncharacterized LOC100190986 LOC440944 4 THUMPD3-AS1,

SETD5-AS1 THUMPD3 antisense RNA 1

LOC91316 4 GUSBP11 GUSB pseudogene 11 LUC7L 4 LUC7L LUC7 like LUC7L3 4 LUC7L3 LUC7 like 3 pre-mRNA splicing factor NCRNA00105 4 ASMTL-AS1 ASMTL antisense RNA 1 OGT 4 OGT O-linked N-acetylglucosamine (GlcNAc) transferase SEC31B 4 SEC31B SEC31 homolog B, COPII coat complex component KZFP789 4 KZFP789 zinc finger protein 789

4

5

6

7

8

9

10

11

12

13

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 47: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 48: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 49: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 50: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 51: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 52: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 53: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 54: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 55: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint

Page 56: Transcriptome analyses of tumor-adjacent somatic tissues ... · 12 TE transcript levels, and obtain a global picture of TE expression and regulation in 13 humans. An important strength

.CC-BY-NC-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted June 4, 2019. ; https://doi.org/10.1101/385062doi: bioRxiv preprint