Poster/cheatsheet for R/BioC package genomation

1
Summarize, annotate and visualize genomic intervals with R/BioC package genomation Genomic intervals Genomic intervals are the basis of genome annota3on. Intervals can contain simply loca3on informa3on (TFBS loca3ons) or can contain a variety of scores, with different scales, such as percent methyla3on, ChIP-seq enrichment or read coverage. Windows/regions of interest over the genome Summary of genomic intervals over different sub-genomic contexts or windows of interest is the necessary first step in making inference about the biological importance of the data. The can be of equal length (Ex: pre-defined regions around TSS) or not (Ex: exons, transcript or CpG islands) Extract genomic intervals for windows of interest Genomic intervals could be extracted for windows of interest and stored in matrix format. If the windows are not of equal length, then binning strategies can be used to create equal number of bins over those windows and s3ll use a matrix to store informa3on. Visualize the summary of genomic intervals The matrices that described above can be visualized using heatmaps or meta-region plots. Meta-region plots show the average value of the signal from genomic intervals over the windows of interest, this could be a line plot or a heatmap where colors indicate average values. Another type heatmap could be used to show the values for matrices that contain the signal value for genomic intervals over the windows. In these heatmaps, every row represents a window and every column is a base-posi3on or a bin. Read genomic intervals into R Genomic intervals of any kind can be red into R using the func3ons below, all which return GRanges or GRangesList objects: readGeneric(file,…)can read generic text files with genomic interval informa3on. readBed(file,...)can read BED files. readTranscriptFeatures(file,…) can read BED12 files with exon/intron structure. gffToGRanges can read a GFF file. readBroadPeak,readNarrowPeak, readFeatureFlank are other convenience func3ons to read BED-like files. Extract genomic intervals for windows of interest in R Genomic intervals could be extracted for windows of interest using ScoreMatrix(target,windows) & ScoreMatrixBin(target,windows) func3ons. These func3ons can handle BAM, BigWig files and GRanges objects as inputs. patternMatrix() returns the rela3ve loca3ons or scores of k-mers or mo3fs, useful for analyzing ChIP-seq. All of these return ScoreMatrix or ScoreMatrixList objects. These objects can be manipulated further using orderBy, binMatrix, scaleScoreMatrix, scaleScoreMatrixList, intersectScoreMatrixList and [] func3ons. Visualize the summary of genomic intervals in R heatMatrix(scoreMatrixobj) and multiHeatMatrix(scoreMatrixListObj ) makes the heatmaps for windows of interest. plotMeta() makes meta-region plots and heatMeta() makes meta-region heatmaps. These func3ons return values invisibly see respec3ve help pages for func3ons. Intergenic Intron Exon Promoter 40.9 11.6 21.8 25.7 Annota3on for genomic intervals in R Annota3on summaries for target genomic intervals can be obtained by annotateWithFeatures(). The resul3ng object can be visualized by plotTargetAnnotation() for piechart and heatTargetAnnotation() for a heatmap of annota3on overlapping percentages. Contributors: Altuna Akalin [aut, cre], Vedran Franke [aut, cre], Katarzyna Wreczycka [aut], Alexander Gosdschan [ctb], Liz Ing-Simmons [ctb] CitaCon: Akalin A, Franke V, Vlahovicek K, Mason CE, Schubeler D. (2015). Bioinforma3cs. Doi: 10.1093/bioinforma3cs/btu775 Genomic intervals with different kinds of informa3on Extract subset of genomic intervals for windows of interest Meta-region line plots Meta-region heatmap heatmaps for genomic intervals Annota3on pie charts Annota3on heatmaps Annota3on for genomic intervals Genomic intervals such as ChIP-seq peaks or differen3ally methylated regions needs to be annotated further with known genomic annota3ons such as promoter, intron, exon structures of the genes. This gives addi3onal informa3on for func3onal characteriza3on of the genomic intervals. This usually represented as percent of genomic features overlapping with the annota3on. urls: hap://bioinforma3cs.mdc-berlin.de/genoma3on hap://www.biconductor.org/release/genoma3on Install genomaCon: OR devtools::install_github(”BIMSBbioinfo/genomation”, build_vig=F) source("https://bioconductor.org/biocLite.R") ; biocLite(“genomation”) Visualize summaries and annota3on for genomic intervals Misc. func3ons getRandomEnrichment() calculates the significance of associa3on between two sets of genomics intervals. Valid for package version => 1.5.6

Transcript of Poster/cheatsheet for R/BioC package genomation

Page 1: Poster/cheatsheet for R/BioC package genomation

Summarize, annotate and visualize genomic intervals with R/BioC package genomation

GenomicintervalsGenomicintervalsarethebasisofgenomeannota3on.Intervalscancontainsimplyloca3oninforma3on(TFBSloca3ons)orcancontainavarietyofscores,withdifferentscales,suchaspercentmethyla3on,ChIP-seqenrichmentorreadcoverage.

Windows/regionsofinterestoverthegenomeSummaryofgenomicintervalsoverdifferentsub-genomiccontextsorwindowsofinterestisthenecessaryfirststepinmakinginferenceaboutthebiologicalimportanceofthedata.Thecanbeofequallength(Ex:pre-definedregionsaroundTSS)ornot(Ex:exons,transcriptorCpGislands)

ExtractgenomicintervalsforwindowsofinterestGenomicintervalscouldbeextractedforwindowsofinterestandstoredinmatrixformat.Ifthewindowsarenotofequallength,thenbinningstrategiescanbeusedtocreateequalnumberofbinsoverthosewindowsands3lluseamatrixtostoreinforma3on.

VisualizethesummaryofgenomicintervalsThematricesthatdescribedabovecanbevisualizedusingheatmapsormeta-regionplots.Meta-regionplotsshowtheaveragevalueofthesignalfromgenomicintervalsoverthewindowsofinterest,thiscouldbealineplotoraheatmapwherecolorsindicateaveragevalues.Anothertypeheatmapcouldbeusedtoshowthevaluesformatricesthatcontainthesignalvalueforgenomicintervalsoverthewindows.Intheseheatmaps,everyrowrepresentsawindowandeverycolumnisabase-posi3onorabin.

ReadgenomicintervalsintoRGenomicintervalsofanykindcanberedintoRusingthefunc3onsbelow,allwhichreturnGRangesorGRangesListobjects:readGeneric(file,…)canreadgenerictextfileswithgenomicintervalinforma3on.readBed(file,...)canreadBEDfiles.readTranscriptFeatures(file,…) canreadBED12fileswithexon/intronstructure.gffToGRanges canreadaGFFfile. readBroadPeak,readNarrowPeak, readFeatureFlankareotherconveniencefunc3onstoreadBED-likefiles.

ExtractgenomicintervalsforwindowsofinterestinRGenomicintervalscouldbeextractedforwindowsofinterestusingScoreMatrix(target,windows) &ScoreMatrixBin(target,windows) func3ons.Thesefunc3onscanhandleBAM,BigWigfilesandGRangesobjectsasinputs.patternMatrix() returnstherela3veloca3onsorscoresofk-mersormo3fs,usefulforanalyzingChIP-seq.AllofthesereturnScoreMatrixorScoreMatrixListobjects.Theseobjectscanbemanipulatedfurtherusing orderBy, binMatrix, scaleScoreMatrix, scaleScoreMatrixList, intersectScoreMatrixList and [] func3ons.

VisualizethesummaryofgenomicintervalsinRheatMatrix(scoreMatrixobj) andmultiHeatMatrix(scoreMatrixListObj) makestheheatmapsforwindowsofinterest. plotMeta()makesmeta-regionplotsandheatMeta() makesmeta-regionheatmaps.Thesefunc3onsreturnvaluesinvisiblyseerespec3vehelppagesforfunc3ons.Intergenic

IntronExonPromoter40.9

11.6

21.825.7

��� �� 0 500 1000

0.0

0.2

0.4

0.6

0.8

1.0

base-pairs around anchor

read

per

milli

on TF4TF3TF2TF1

���

���

0

500

100

0

0 0.5 1 1.5 2

TF 4

���

���

0

500

100

0

0 0.5 1 1.5 2 2.5

TF 3

���

���

0

500

100

0

0 0.5 1 1.5 2 2.5

TF 2

���

���

0

500

100

0

0 0.5 1 1.5 2 2.5

TF 1

��� �� 0 500 1000base-pairs around anchor

TF1

TF2

TF3

TF4

0.07

20.

340.

60.

861.

1

meta-region plots meta-region heatmaps heatmaps for genomic interval sets

Piecharts for annotation

Annota3onforgenomicintervalsinRAnnota3onsummariesfortargetgenomicintervalscanbeobtainedbyannotateWithFeatures().Theresul3ngobjectcanbevisualizedbyplotTargetAnnotation() forpiechartandheatTargetAnnotation() foraheatmapofannota3onoverlappingpercentages.

Contributors:AltunaAkalin[aut,cre],VedranFranke[aut,cre],KatarzynaWreczycka[aut],AlexanderGosdschan[ctb],LizIng-Simmons[ctb]CitaCon:AkalinA,FrankeV,VlahovicekK,MasonCE,SchubelerD.(2015).Bioinforma3cs.Doi:10.1093/bioinforma3cs/btu775

Genomicintervalswithdifferentkindsofinforma3on

Extractsubsetofgenomicintervalsforwindowsofinterest

Meta-regionlineplots

Meta-regionheatmap

heatmapsforgenomicintervals

Annota3onpiecharts Annota3onheatmaps

Annota3onforgenomicintervalsGenomicintervalssuchasChIP-seqpeaksordifferen3allymethylatedregionsneedstobeannotatedfurtherwithknowngenomicannota3onssuchaspromoter,intron,exonstructuresofthegenes.Thisgivesaddi3onalinforma3onforfunc3onalcharacteriza3onofthegenomicintervals.Thisusuallyrepresentedaspercentofgenomicfeaturesoverlappingwiththeannota3on.

urls:hap://bioinforma3cs.mdc-berlin.de/genoma3onhap://www.biconductor.org/release/genoma3onInstallgenomaCon: OR

devtools::install_github(”BIMSBbioinfo/genomation”, build_vig=F)

source("https://bioconductor.org/biocLite.R") ; biocLite(“genomation”)

Visualizesummariesandannota3onforgenomicintervals

Misc.func3onsgetRandomEnrichment() calculatesthesignificanceofassocia3onbetweentwosetsofgenomicsintervals.

Validforpackageversion=>1.5.6