Chromes from Chromatin: Sonification of the …...2016/01/21 · 21 Introduction 22 Sonification is...
Transcript of Chromes from Chromatin: Sonification of the …...2016/01/21 · 21 Introduction 22 Sonification is...
ChromesfromChromatin:SonificationoftheEpigenome1
2
DavideCittaro1,DejanLazarevic1,PaoloProvero23
4
Authoraffiliations5
1:CenterforTranslationalGenomicsandBioinformatics,SanRaffaeleHospital-viaOlgettina58,201386
Milano,Italy7
2:DepartmentofMolecularBiotechnologyandLifeSciences,UniversityofTurin–viaNizza52,101268
Torino,Italy9
10
Correspondingauthor11
Davide Cittaro; Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via12
olgettina58,20138Milano,Italy;telephone:+390226439137;email:[email protected]
14
Keywords15
Sonification;Chromatin;Music;Informationcontent16
17
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Abstract1
Theepigeneticmodificationsareorganized inpatternsdeterminingthe functionalpropertiesof the2
underlyinggenome.Suchpatterns,typicallymeasuredbyChIP-seqassaysofhistonemodifications,can3
becombinedandtranslatedintomusicalscores,summarizingmultiplesignalsintoasinglewaveform.4
As music is recognized as a universal way to convey meaningful information (1), we wanted to5
investigateproperties ofmusic obtainedby sonificationof ChIP-seqdata.We show that themusic6
produced by such quantitative signals is perceived by human listener as more pleasant than that7
producedfromrandomizedsignals.Moreover,thewaveformcanbeanalyzedtopredictphenotypic8
properties,suchasdifferentialgeneexpression.9
10
SignificanceStatement11
Music is recognized as universal way to communicate emotions and,more in general, meaningful12
information. Various sources of information can be translated into music or sounds, mostly for13
recreationalpurposes.Ithasbeenshownthathumanearcanclassifyinformationencodedintosounds.14
Quantitativegenomicfeatures,andinparticularepigeneticmarks,dorepresentfunctionalinformation15
thatisexploitedbycellstodrivebiologicalprocesses.Wetestamethodtotranslatesuchinformation16
intomusicandwestudysomepropertiesofthesonificatedchromatinmarks.Weshowthatnotonly17
musicalrepresentationofepigeneticmarkshasintrinsicmusicality,butalsothatdifferencesinmusical18
representationofgenomiclocireflectdifferencesoftheRNAlevelsoftheunderlyinggenes.19
20
Introduction21
Sonificationistheprocessofconvertingdataintosound.Sonificationitselfhasalong,yetpunctuated,22
storyofapplicationsinmolecularbiology,severalalgorithmstotranslateDNA(2)orproteinsequences23
(3, 4) tomusical scores have been proposed. The sameprinciples have also been extended to the24
analysisofcomplexdata(5)showingthat,all inall,sonificationcanbeusedtodescribeandclassify25
data.Indeed,theverysameproceduresmayalsobeappliedforrecreationalpurposes.26
27
OneofthelimitationsofsonificationofactualDNAandproteinsequencesistheirintrinsicconservative28
nature.Assuming thedifferences in two individual genomes are, on average, onenucleotideevery29
kilobase(6),thecorrespondingmusicalscoreswouldhavelittledifferences.30
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Onthecontrary,dynamicrangestypicaloftranscriptomicandepigenomicdatamayprovidearicher1
sourceforsonification.2
3
In thisworkwedescribeanapproach toconvertChIP-seqsignals,and inprincipleanyquantitative4
genomicfeature,intoamusicalscore.Westartedworkingonourapproachforamusementmainly,and5
werealizedthatthesonificatedchromatinsignalweresurprisinglyharmonious.Wethentriedtoassess6
somepropertiesofthemusictrackswewereabletogenerate.Weshowthattheemergingsoundsare7
notrandomandinsteadappearmoremelodiousandtunefulthanmusicgeneratedfromrandomized8
notes.WealsoshowthatdifferentChIP-seqsignalscanbecombinedintoasinglemusicaltrackand9
that tracks representing different conditions can be compared allowing for the prediction of10
differentiallyexpressedgenes.11
Examples of sonification for various genomic loci are available at https://soundcloud.com/davide-12
cittaro/sets/k56213
14
Definitions15
MIDI: MIDI (Musical Instrument Digital Interface) is a standard that describes protocols for data16
exchangeamongavarietyofdigitalmusicalinstruments,computersandrelateddevices.MIDIformat17
encodes information about note notation, pitch, velocity and other parameters controlling note18
execution(e.g.volumeandsignalsforsynchronization).19
MIDIfileformat:abinaryformatrepresentingMIDIdatainahierarchicalsetofobjects.Atthetopof20
hierarchythereisaPattern,whichcontainsalistofTracks.ATrackisalistofMIDIevents,encodingfor21
noteproperties.MIDIeventshappenatspecifictime,whichisalwaysrelativetothestartofthetrack.22
MIDIResolution:resolutionsetsthenumberoftimesthestatusbyteissentforaquarternote.The23
highertheresolution,themorenaturalthesoundisperceived.ResolutionisthenumberofTicksper24
quarternote.AtaspecificresolutionR,TickdurationinmicrosecondsTisrelatedtotempo(expressed25
inBeatsperMinute,BPM)bythefollowingequation26
27
! = 60%&'(28
29
30
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Results1
Approach2
InordertotranslateasingleChIP-seqsignaltracktomusicwebinthesignaloveraspecifiedgenomic3
interval (i.e. chrom:start-end) into fixed-size windows (e.g. 300 bp) and note duration will be4
proportionaltothesizeofsuchwindows.AswearedealingwithMIDIstandard,welettheuserspecify5
track resolution and the number of ticks per window (see definitions); the combination of these6
parametersdefinesthedurationofasinglenote.Thedefaultparametersassociateabinof300bpwith7
onequaver(1/8note).8
9
Inordertodefinethenotepitch,wetakethelogarithmoftheaverageintensityoftheChIP-seqsignal10
in a genomicbin. The sounding rangeof thewhole signal is discretized in apredefinednumberof11
semitones.Atdefaultparameters,therangeisbinnedinto52semitones,coveringfouroctaves.Inorder12
tointroducepauses,thelowestbinofthesignalrangerepresentsarest.Iftwoconsecutivenotesor13
restsfallinthesamebin,wemergetheminonenotedoublingitsduration.14
15
Usingthisapproach,anyChIP-seqsignalcanbemappedtoachromaticscale.Weimplementedthe16
possibilitytomapasignalonadifferentscale(major,minor,pentatonic…);tothisend,intensitybin17
boundariesaremergedaccordingtothedefinitionofaspecificscale(Figure1).MIDItracksproduced18
in thisway canbe then imported intoa sequencer softwarewhere they canbe furtherprocessed,19
settingtempoandtimesignature.20
21
Musicproducedfromchromatinmarksisnotperceivedasarandompattern22
Inordertotestwhethersonificationofchromatinmarksareperceivedasrandompatterns,weselected23
tengenomicregionsandgeneratedcorrespondingtracksbasedonthefollowinghistonemodifications:24
H3K27me3,H3K27ac,H3K9ac,H3K36me3,H3K4me1,H3K4me2,H3K4me3,H3K9me325
(SupportingAudiofilesS1.1toS10.1).Forthesameregions,werandomizedgenomicsignalatbase26
and bin level (Supporting Audio files S1.2 to S10.2).When data are randomized at base level, the27
average intensity isuniformsacross thebins, resulting ina repeatednote (datanot shown); this is28
largelyexpectedasChIP-seqsignalsaredistributedonthegenomeaccordingtoaPoissonlaw(7)or,29
moreprecisely,toaNegativeBinomiallaw(8).30
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
1
Randomizationatbinlevel,instead,equalstoshufflingnotesduringtheexecution.Weadministrated2
aquestionnairetoasetofvolunteers(n=8)notpreviouslytestedforeducationinmusic.Volunteers3
wereaskedtolistentoeachpairoforiginal/randomtrackandchoosewhichtracktheyfeltwasmore4
appealing.Trackorderwasrandomizedwhentestingdifferentvolunteers.Notably,inthemajorityof5
thecases(62/80)themusicgeneratedfromgenomicsignalwithoutrandomizationwasjudgedmore6
appealing.Resultsaresignificant toaFisher-exact test (p=1.95e-3), suggesting thatgenomic signals7
contain informationthatcanberecognizedbyhumanear.Thenumberofcorrectanswersforeach8
volunteerrangedfrom5to10,withamedianvalueof8.9
10
Differencesinmusicaltracksreflectdifferencesingeneexpression11
Onceweassessedtheexistenceofmusicalpatterningenomicssignals,wewerekeentoexploreifthis12
kindofinformationcouldbeexploitedtoidentifybiologicalfeaturesofsamples.Sincetheepigenetic13
DNAmodificationsreflectedbyhistonemarksinfluencegeneexpression(9),wetestedifdifferencesin14
musicaltracksgeneratedfromvariousChIP-seqsignalsreflectsdifferencesingeneexpressionofthe15
corresponding loci. To this end, we downloaded ChIP-seq marks (H3K27me3, H3K27ac, H3K9ac,16
H3K36me3,H3K4me1,H3K4me2,H3K4me3,H3K9me3,Pol2b)andRNA-seqdataforK562andNHEK17
celllinesfromtheENCODEproject(10).ForeachRefSeqlocusweconvertedChIP-seqsignalstomusic18
withfixedparameters(seemethods).RNA-seqdatawereusedtoidentifygenesthataredifferentially19
expressedbetweenthetwocelllines,underap-value<0.01and|logFC|>1,accordingtorecentSEQC20
recommendations(11).21
22
Acommonwaytoclassifymusicisbasedonsummarizationoftrackfeaturesafterspectralanalysis(12,23
13).SuchapproachinvolvesthesummarizationoftrackasMel-FrequencyCepstralCoefficients(MFCC)24
thataresubsequentlyclusteredusingGaussianMixtureModels(GMM).Adistancebetweentrackscan25
thenbedefinedasdescribedin(14),whouseditasaclassifierformusicalgenres.26
27
Wetestedifasimilarapproachcouldbeusedtodevelopapredictorofdifferentialexpressionbased28
onthedistancebetweenmusicaltracksgeneratedfromtwocelllines.29
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Wedefinedadistancebetweensongsasdescribedinmethodsandweoptimizedtheparametersusing1
asatrainingsetthe250geneswiththemostsignificantdifferentialexpressionp-valueandasmany2
genes with the least significant p-value according to RNA-seq (Figure 2). We found that optimal3
performanceisatMFCC=30andGMM=10,withanAUC=0.609.4
5
We summarized tracks representing all RefSeq genes using such parameters, we then compared6
distanceswithdifferentialexpressionperformingaROCanalysis.Ourresultsindicatethatdifferences7
ininformationcontainedinmusicalrepresentationofchromatinsignalsmaybelinkedtodifferential8
expression,althoughpowerofpredictionislimited(AUC=0.5184,p=1.4597e-03).9
10
Similaritybetweenmusicaltracksoverlapssimilarbiologicalproperties11
Asadditional issuewewantedtoassess ifsimilaritiesbetweenmusicalrepresentationofchromatin12
status may be linked to the biology of the underlying genes. To this end, we calculated pairwise13
distancesforallregionsusingparametersidentifiedaboveonK562cellline.Hierarchicalclusteringof14
the distance matrix identifies eight major clusters (Figure 3, left). We performed Gene Ontology15
Enrichmentanalysisoneachcluster,here representedaswordcloudof significant terms (Figure3,16
center,supplementaryTableS2);wefoundthatdifferentclustersarelinkedtogenesshowingdifferent17
biologicalproperties.Forexample,someclusters(6,7and8)werelinkedtoregulationofcellcycle,18
otherswerelinkedtometabolicprocesses(2and5)orvesicletransport(3and4).Wealsoevaluated19
thedistributionofexpression(expressedaslog(RPKM))oftheunderlyinggenes(Figure3,right);we20
foundthatregionsclusteredbythedistancebetweenmusicaltracksbroadlyreflectsgroupsofgenes21
withdifferentlevelofexpression,spottingclustersofhigherexpression(cluster5)orlowerexpression22
(clusers2and3);assessmentofstatisticalsignificanceofdifferencesindistributionofgeneexpression23
valuesamongclustersispresentedinTable1.24
25
Discussion26
Chromatin shape and genome function are governed, among several factors, by the coordinated27
organizationofepigeneticmarks (15).Modificationsof suchmarksaredynamicandare fine-tuned28
duringthelifeofacelloranorganism.Analysisofhistonemodifications,aswellastranscriptionfactors29
andotherproteinsbindingDNA,byChIP-seqalreadydescribedpatternsofenrichmentthatarespecific30
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
totheirrelativefunction(16,17).Analysisofcombinatorialpatternsofhistonemodificationsalready1
unveileditspotentialinunderstandingfunctionalpropertiesofthegenome(18,19)andthecross-talk2
amongmultiplechromatinmarks(20).3
4
Weshow,inthiswork,thattheinformationcarriedbymultiplehistonemodificationscanbecaughtin5
ahuman-friendlywaybytranslatingChIP-seqsignalsintomusicalscores.Althoughtheinvestigationof6
thepsychologicalfactorsthatunderlietunefulperceptionofsonificatedgenomicsignalsisoutofthe7
scopeofthismanuscript,ourresultssuggestthathumanhearingisabletoperceivepatternsconveying8
informationencodedinChIP-seqdataanalyzedandtodistinguishfromrandomnoise.9
10
Weautomatedtheanalysisofdifferencesbetweenmusicaltracksusinganestablishedmethodbased11
on summarization of spectral data. By this approach, we investigated the possible link between12
differencesinwayschromatinsoundsandphenotypicfeatures.Ourresultssuggestthatdifferencesin13
transcript levels can be predicted by the differences of sonificated genomic regions, although14
performancesofsuchapproachare limited.Wereasonedthatmany factorsmayexplainsuchpoor15
results:firstofallthereisavastspaceofparametersthatcanbetunedtocreateasinglemusicaltrack16
andwestilllackmethodstoexploreitefficiently.Inaddition,theMelscaleusedtosummarizeaudio17
signalhasbeendevelopedtomatchhumancapabilitiestoperceivesound(21),henceitmaynotbe18
optimalforthecomparisonofthetracksgeneratedinthiswork.19
20
Ithasalreadybeenshownthatitispossibletopredictlevelsofgeneexpressionstartingfromchromatin21
states, although themethod used to perform chromatin segmentation has a large impact on such22
predictions(22).Inthisworkwefoundthatdifferencesinchromatin-derivedmusicreflects,tosome23
extent,differencesinlevelofexpressionofunderlyinggenesandtheirrelatedbiology.24
25
Toconclude,althoughwecannotadvocatetheusageofmusicalanalysisasuniversaltooltoanalyze26
biologicaldatayet,weconfirmthatquantitativefeaturesonthegenomearepatternedandcontain27
information,hencecanbeconvertedintosoundsthatareperceivedasmusical.Welimitedouranalysis28
onspecificchromatinmodifications,butinprincipleanyquantitativegenomicfeaturecanbeconverted29
andintegratedintoamusicaltrack.Thechoiceofparametersandinstrumentshasbeenstandardized30
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
fortheanalysispresented,forillustrativepurposeweshowthatdifferentsignalsfromthesameregion1
canbecombinedusingdifferent instruments(https://soundcloud.com/davide-cittaro/random-locus-2
blues) and signals fromdifferent genomic regions can bemerged (https://soundcloud.com/davide-3
cittaro/non-homologous-end-joining).4
5
6
MaterialsandMethods7
8
SonificationofENCODEChIP-seqdata9
Rawdata for variousmodificationswere downloaded from (GSE26320). Read tagswere aligned to10
humangenome(hg19)usingbwaaligner(23).Alignmentswereconvertedtobigwigtracks(24)after11
filteringforduplicatesandqualityscorehigherthan15.12
Inordertodefineregionstobeconvertedtomusicscores,weselectedintervalsaroundRefSeqgene13
definition,from1kbupstreamofTSSto2kbdownstreamofTES.ChIP-seqsignalwerefirstlyconverted14
toMIDIusingcustomscripts(https://bitbucket.org/dawe/enconcert)accordingtoparametersdefined15
inTable2.MIDItracksbelongingtothesameregionfromthesamesampleweremergedintoasingle16
MIDIfile,convertedtoWAVformatusingtimiditysoftware(http://timidity.sourceforge.net),withthe17
exceptionoftrackspresentedassupplementaryinformationwhichhavebeenprocessedwithApple18
GarageBandsoftware.19
20
ComparisonofWAVtracks21
In order to compare four samples for each converted genomic region, we extracted MFCC using22
python_speech_featureslibrary(https://github.com/jameslyons/python_speech_features).Selected23
componentswerethenclusteredusingGaussianMixtureModels,implementedinscikit-learnpython24
library0.15.2 (http://scikit-learn.org).Distancebetween two trackswereevaluatedusingHausdorff25
distance(H)betweenGMMclusters.Briefly,wefirstcalculateallpairwisedistancesbetweenGMM26
clustersusingBhattacharyyadistance(B)formultivariatenormaldistributionsas27
28
& = 18 +, − +. /'0. +, − +. + 12 345
'6, 6.
29
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
1
where2
3
' = 6, + 6.2 4
5
then,asGMMarenotordered,wetaketheHausdorffdistance(H)asthemaximumbetweentherow-6
wiseandcolumn-wiseminimumofthepairwisedistancesbetweentwoGMMsets.ROCanalysison7
musicdistanceswasperformedoverthevalueofD,definedas8
9
7 = 345 1 + 8 − 345 1 + : 10
11
where12
13
: = ; <562>, <562@ + ; A;B<>,A;B<@2 14
15
istheaverageofdistancesbetweenreplicatesand16
17
8 = ; <562>, A;B<> + ; <562>, A;B<@ + ; <562@, A;B<> + ; <562@, A;B<@4 18
19
istheaverageofpairwisedistancesamongdifferentcelllines.20
21
Assessmentofdifferentiallyexpressedgenes22
RNA-seq tagswere aligned to reference genomeusing STAR aligner (25). Read counts overRefSeq23
intervalswereextractedusingbedtools.DiscretecountswerenormalizedwithTMM(26),differential24
gene expressionwas evaluated using the voom function implemented in limma (27)with a simple25
contrastbetweentwocelllines.Geneswereconsidereddifferentiallyexpressedunderap-valuelower26
than0.01andabsolutelogarithmFoldChangehigherthan1.27
28
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Clusteranalysis1
Clusteranalysiswasperformedonreplicate1ofK562dataset.WecalculatedallpairwiseHausdorff2
distances among genomic loci as defined above. Data were clustered using the Ward method.3
EnrichmentanalysiswasperformedusingEnrichr (28).Wordcloudswerecreatedwithworld_cloud4
pythonpackage(https://github.com/amueller/word_cloud)usingtextdescriptionofontologieshaving5
positiveEnrichrcombinedscore.DifferentialexpressionamongclusterswasevaluatedusingMann-6
WhitneyU-test.7
8
Acknowledgements9
Authorswouldliketothankallcollaboratorsandrelativeswhokindlysacrificedtheirtimetolistento10
musicgeneratedwhilethisworkwasdeveloped.11
12
13
References14
15161. JuslinPN(2013)Whatdoesmusicexpress?Basicemotionsandbeyond.FrontPsychol4:596.17
2. OhnoS,OhnoM(1986)Theallpervasiveprincipleofrepetitiousrecurrencegovernsnotonly18codingsequenceconstructionbutalsohumanendeavorinmusicalcomposition.19Immunogenetics24(2):71–78.20
3. KingRD,AngusCG(1996)PM--proteinmusic.ComputApplBiosci12(3):251–252.21
4. TakahashiR,MillerJH(2007)Conversionofamino-acidsequenceinproteinstoclassicalmusic:22searchforauditorypatterns.GenomeBiol8(5):405.23
5. LarsenP,GilbertJ(2013)MicrobialBebop:CreatingMusicfromComplexDynamicsinMicrobial24Ecology.PLoSONE8(3):e58119.25
6. ConsortiumT1GP,etal.(2013)Anintegratedmapofgeneticvariationfrom1,092human26genomes.Nature490(7422):56–65.27
7. ZhangY,etal.(2008)Model-basedanalysisofChIP-Seq(MACS).GenomeBiol9(9):R137.28
8. RobinsonMD,McCarthyDJ,SmythGK(2009)edgeR:aBioconductorpackagefordifferential29expressionanalysisofdigitalgeneexpressiondata.doi:10.1093/bioinformatics/btp616.30
9. KouzaridesT(2007)ChromatinModificationsandTheirFunction.Cell128(4):693–705.31
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
10. ENCODEProjectConsortium,ConsortiumTEP(2012)AnintegratedencyclopediaofDNA1elementsinthehumangenome.Nature489(7414):57–74.2
11. SEQC/MAQC-IIIConsortium,SEQC/MAQC-IIIConsortium(2014)Acomprehensiveassessment3ofRNA-seqaccuracy,reproducibilityandinformationcontentbytheSequencingQuality4ControlConsortium.doi:10.1038/nbt.2957.5
12. LoganB(2000)MelFrequencyCepstralCoefficientsforMusicModeling.ISMIR.6
13. DoplerM,SchedlM,PohleT,KneesP(2008)AccessingMusicCollectionsViaRepresentative7ClusterPrototypesinaHierarchicalOrganizationScheme.ISMIR.8
14. AucouturierJJ,PachetF,SandlerM(2005)“ThewayitSounds”:timbremodelsforanalysisand9retrievalofmusicsignals.IEEETransMultimedia7(6):1028–1035.10
15. ZhouVW,GorenA,BernsteinBE(2011)Chartinghistonemodificationsandthefunctional11organizationofmammaliangenomes.NatRevGenet12(1):7–18.12
16. BarskiA,etal.(2007)High-resolutionprofilingofhistonemethylationsinthehumangenome.13Cell129(4):823–837.14
17. ValouevA,etal.(2008)Genome-wideanalysisoftranscriptionfactorbindingsitesbasedon15ChIP-Seqdata.NatMeth5(9):829–834.16
18. ErnstJ,KellisM(2012)ChromHMM:automatingchromatin-statediscoveryand17characterization.NatMeth9(3):215–216.18
19. SongJ,ChenKC(2015)Spectacle:fastchromatinstateannotationusingspectrallearning.19GenomeBiol16(1):33.20
20. PernerJ,LasserreJ,KinkleyS,VingronM,ChungHR(2014)Inferenceofinteractionsbetween21chromatinmodifiersandhistonemodifications:fromChIP-Seqdatatochromatin-signaling.22NucleicAcidsRes42(22):13689–13695.23
21. StevensSS,VolkmannJ,NewmanEB(1937)AScalefortheMeasurementofthePsychological24MagnitudePitch.TheJournaloftheAcousticalSocietyofAmerica8(3):185–190.25
22. HoffmanMM,etal.(2013)IntegrativeannotationofchromatinelementsfromENCODEdata.26NucleicAcidsRes41(2):827–841.27
23. LiH,DurbinR(2010)Fastandaccuratelong-readalignmentwithBurrows-Wheelertransform.2826(5):589–595.29
24. KentWJ,ZweigAS,BarberG,HinrichsAS,KarolchikD(2010)BigWigandBigBed:enabling30browsingoflargedistributeddatasets.doi:10.1093/bioinformatics/btq351.31
25. DobinA,etal.(2012)STAR:ultrafastuniversalRNA-seqaligner.29(1):15–21.32
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
26. RobinsonMD,OshlackA(2010)Ascalingnormalizationmethodfordifferentialexpression1analysisofRNA-seqdata.GenomeBiol11(3):R25.2
27. RitchieME,etal.(2015)limmapowersdifferentialexpressionanalysesforRNA-sequencingand3microarraystudies.NucleicAcidsRes.doi:10.1093/nar/gkv007.4
28. ChenEY,etal.(2013)Enrichr:interactiveandcollaborativeHTML5genelistenrichment5analysistool.BMCBioinformatics14:128.6
78
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
FigureLegends1
Figure1:Graphicalrepresentationotheapproachusedtotransformquantitativesignalstomusic.ChIP-2
seqvalues(H3K4me3intheexample)arebinnedinfixed-sizeintervalsoverthegenome.Eachinterval3
correspondstoa1/8note.Averagevaluesoflog-transformofreadcountsineachgenomicbin(red4
lines)arematchedintopredefinednumberofsemitones(chromaticscale).Notesmaybemappedtoa5
specified scale (major andminor scales areexemplified in the figure). Consecutiveequal notes are6
mergedinsinglenotewithdoubleduration.Valuesfallinginthefirstbinareconsideredrests.7
8
Figure2:Evaluationofdifferentcombinationofparametersinpredictingdifferentialgeneexpression9
on a gold-standard subset of 500 genes. Each square corresponds to a number ofMel-Frequency10
CepstralCoefficients(MFCC)usedtosummarizesignalandanumberofcentersforGaussianMixture11
Model(GMM).ColorsaregivenbythecorrespondingAreaUndertheCurve(AUC)12
Figure3:Hierarchicalclusteringofgenomicregionsidentifies8mainclusters(left).Eachclusterbroadly13
correspondstospecificbiologicalpropertiesaccordingtoGeneOntologyenrichedterms(middle).Level14
ofexpressionofgenesincludedineachclustershowspecificdistributions(right).15
16
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8
Cluster1 2.635e-23 3.548e-18 8.127e-37 8.627e-125 2.859e-83 3.231e-71 3.169e-63
Cluster2 1.068e-01 2.008e-01 4.005e-01 2.801e-01 7.907e-02 9.467e-02
Cluster3 2.880e-01 2.225e-02 1.399e-01 3.910e-01 3.745e-01
Cluster4 4.159e-02 3.025e-01 2.614e-01 2.821e-01
Cluster5 5.098e-04 3.898e-09 7.586e-07
Cluster6 1.259e-02 2.866e-02
Cluster7 4.545e-01
Table1p-valuesofMann-WhitneyU-testfordifferencesindistributionofexpressionbetweenclusters.Testsshowingsignificantdifference1(p≤0.05)arepresentedinboldface.2
3
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
1Antibody Scale Octave Key TickSize BinSize
H3K27me3 minor 4 B 600 400
H3K27ac minor 3 B 900 600
H3K9ac minor 3 B 1200 800
H3K36me3 minor 4 B 300 200
H3K4me1 minor 3 B 1200 800
H3K4me2 minor 3 B 300 200
H3K4me3 minor 3 B 300 200
H3K9me3 minor 4 B 300 200
Pol2 minor 4 B 600 400
Table2–ParametersusedtoconvertdifferentChIP-seqsignalsintocorrespondingmusicaltrakcs2
34
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint
.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint