Intro to Biocoductor and GeneGraph by Kyrylo Bessonov ([email protected]) 9 Oct 2012.

27
Intro to Biocoductor and GeneGraph by Kyrylo Bessonov ([email protected]) 9 Oct 2012

Transcript of Intro to Biocoductor and GeneGraph by Kyrylo Bessonov ([email protected]) 9 Oct 2012.

Page 1: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Intro toBiocoductor and GeneGraph

by Kyrylo Bessonov

([email protected])

9 Oct 2012

Page 2: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Bioconductor repository

• Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/

• Bioconductor libraries cover• Micorarray analysis• Genetic variants analysis (SNPs)• Sequence analysis (FASTA, RNA-seq)• Annotation (pathways, genes)• High-troughput assays (Mass Spec)

• All libraries are free to use and contain documentation (i.e. vignettes)• Vignettes are short and require some previous

knowledge of R and/or other defendant libraries

2

Page 3: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Configuring R for Bioconductor

• To install Bioconductor libraries the R environment needs to be configuredsource("http://bioconductor.org/biocLite.R")

• To download any Bioconductor library use biocLite("package_name") functionbiocLite("GenomeGraphs")

• Load the downloaded library functions via the library("lib_name")functionlibrary("GenomeGraphs")

• Read help file(s) via the ?? or browseVignettes()??library_name (e.g. ??GenomeGraphs)

browseVignettes(package = "GenomeGraphs")

3

Page 4: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Genome Data Display and Plotting

The GenomeGraphs libraryAuthors: Steen Durinck and James Bullard

(Bioconductor library)

Page 5: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Intro to GenomeGraphs library

• Allows to retrieve data from• Ensembl comprehensive DB on genomes

• intron/exon locations• sequence genetic variation data• protein properties (pI, domains, motifs)

• GenomeGraphs Bioc library allows to display data on:• Gene Expression (expression / location)• Comparative genomic hybridization (CGH) • Sequencing (e.g. variations/introns/exons)

5

Page 6: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

GenomeGraphs: Plotting with gdPlot

• gdPlot main plotting functiongdPlot(gdObjects, minBase, maxBase ...)

•gdObjects = any objects created by GenomeGraphsa) BaseTrack; b) Gene; c) GenericArray; d) RectangleOverlay

•minBase the lowest nt position to be plotted (optional)•maxBase the largest nt to display (optional)

• Objects are data structures that hold/organize a set of variables1) Create an object to plot using makeBaseTrack() functionObjBaseT = makeBaseTrack(1:100, rnorm(1:100),strand = "+")

2) Plot the newly created objectgdPlot(ObjBaseT)

• Display object structure / properties (e.g. of BaseTrack)attributes(ObjBaseT)

•$ sign next to the name represents object variables

• Change or view individual object variablesattr(ObjBaseT, “strand")

6

Page 7: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Manipulating GenomeGraphs class graphical properties

• Changing values of variables (classical way)attr(Object, "variable/parameter") = value

attr(ObjBaseT, "strand") = "-“

•If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value

attr(ObjBaseT, "base")[1] = 0

• Changing graphical parameters such as colorshowDisplayOptions(ObjBaseT)alpha = 1 lty = solid color = orange lwd = 1 size = 5 type = p getPar(ObjBaseT, "color")[1] "orange"setPar(ObjBaseT, "color", "blue")

7

Page 8: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

gdPlot composite/group plots

• gdPlot(…) can take any number of gdObjects to plot• Let’s plot two BaseTrack objects a and b

a = makeBaseTrack(1:100, rlnorm(100), strand = "+")

b = makeBaseTrack(1:100, rnorm(100), strand = "-")

ab=list(a,b)

gdplot(ab)

8

Page 9: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Manipulating values of the grouped gdObjects

• ab is list of objects a and b• To access individual object within the list use [ ]

ab[1] will display object a

ab[2] will display object b• To modify the grouped object use double [[ ]]

•To change color of b to red getPar(ab[[2]], "color")

setPar(ab[[2]], "color", "red") -OR- (re-create object)

b = makeBaseTrack(1:100,

rnorm(100), strand = "-",

dp=DisplayPars(color="red"))

ab=list(a,b)

gdPlot(ab)9

Page 10: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Plotting with labels and legend

• gdPlot does not provide functions to label axis• Trick = “use labeled / tagged” objects

"label" = GenomeGraph object

"+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+")

• To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis(),

makeLegend(c("+", "-"),c("orange", "red")) )

10

Page 11: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Retrieving and Displaying Data from

Public Database

Combining capabilities of

biomaRt and GenomeGraph libraries

Page 12: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Welcome to biomaRt• biomaRt library allows to retrieve data from public DBs

ensembl ENSEMBL GENES 68 (SANGER UK)

snp ENSEMBL VARIATION 68 (SANGER UK)

unimart UNIPROT (EBI UK)

bacteria_mart_14 ENSEMBL BACTERIA 14 (EBI UK)

***Use listMarts() to see all available databases***

• Let’s retrieve gene data of the Bacillus subtilis strain•useMart(database,dataset)allows to connect to specified database and dataset within this database

db=useMart("bacteria_mart_14")

listDatasets(db)

Dataset Description version

… … …

bac_6_gene Bacillus subtilis genes (EB 2 b_subtilis) EB 2 b_subtilis

db=useMart("bacteria_mart_14", "bac_6_gene")12

Page 13: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Exploring biomaRt object• listAttributes()shows all prop. of the biomaRt obj.

• The db object has total of 4175 genes• Use getBM(attribute, filter, value, biomaRt_obj)to

extract values belonging to specified attributes• attribute: general term such as gene

name/chromosome # / strand (+ or - )• filter: parameter applied on attribute such as

genomic region to consider (i.e. start and end in nt)• value: actual values of the applied filter(s)

getBM(c("external_gene_id", "description","start_position", "end_position", "strand"), filters = c("start", "end"),

values = list(1,10000), db)

ID description start(nt) end(nt) strand

metS Methionyl-tRNA synthetase 45633 47627 1

ftsH Cell division protease ftsH homolog 76984 78897 1

hslO 33 kDa chaperonin 79880 80755 1

DgkDeoxyguanosine kinase 23146 23769 -1 13

Page 14: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Plotting the selected “Genome Region”

• Create an object with makeGeneRegion()function

makeGeneRegion(start,end,chromosome name,

strand, biomaRt object, plotting options)

• Find notation used for the chromosome naminggetBM("chromosome_name","","",db)

chromosome_name

1 ChromosomegRegion = makeGeneRegion(1, 10000, chr = "Chromosome",

strand = "+", biomart = db, dp =DisplayPars(plotId = TRUE, idRotation = 90,

cex = 0.8, idColor = "black"))

gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1)))

14

Page 15: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Bacillus subtilis genome region (intron / exon)

15

ensembl_gene_id

Page 16: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Mapping Expression data RNAseq

GenomeArray()

Page 17: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Intro into RNA-seq data

• HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp

• In addition to gene expression analysis it possible to• Map location of introns (UTRs) / exons • Principal:  one searches for a rapid changes in

abundance of the RNA-Seq signal (contigs)• Integration of sequence + expression information• Ugrappa Nagalakshmi et. al. 2008 had used this

strategy to accurately map yeast genome1 • Task: Display part of the seqDataEx dataset having both

• abundance of mRNA (cDNA) transcripts and• annotated yeast genome

171Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008

Page 18: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Plotting RNA-seq data with GenomeArray() • Need to get mRNA contig abundance data

data("seqDataEx")

rnaSeqAb=seqDataEx$david

• Create GeneticArray object with makeGenericArray(intensity, probeStart, probeEnd, trackOverlay, dp = NULL)

•intensity either microaray or RNAseq transcript abundance signal•probStart start position of the probe (location in nt)

• Plot contig abundance w.r.t. to genomic locationgdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) , makeGenomeAxis() ) )

18

Page 19: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Adding genomic annotation to the prev plot

• Need to get annotated yeast genome dataannotGen = useMart("ensembl", "scerevisiae_gene_ensembl")

• Create mRNA contig abundancemRNAabun = makeGenericArray(rnaSeqAb[, "expr",

drop = FALSE], rnaSeqAb [, "location"])

• Create annotated seq. covering the mRNA contigs locationannotSeq= makeGeneRegion(start = min(rnaSeqAb[, "location"]),

end = max(rnaSeqAb[, "location"]), chr = "IV",

strand = "+", biomart = annotGen,

dp = DisplayPars(plotId = TRUE, idRotation = 0,

cex = 0.85, idColor = "black", size=0.5))

• Combine objects and plot themgdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq),

1299000,1312000)19

Page 20: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Resulting plot:Transcript abundance w.r.t. to location

20

Page 21: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Overlays of Basic Shapes and Custom Text

• To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL,

coords = c("genomic", "absolute"), dp)

rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5))

• To overlay text use makeTextOverlay()tOver = makeTextOverlay("Ribosomal Large subunit",

1302000, 0.95, region = c(1,1),

dp = DisplayPars(color = "red"))

• Combine all overlay objects into one vector with c(v1,v2)gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), overlays=c(rectOver, tOver) )

21

Page 22: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Overlay of rectangle and text

22

Page 23: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Alternative splicing of transcript

makeTranscript(id, type, biomart, dp = NULL)

Page 24: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Displaying alternative splicing of a gene• mRNA coming from the same ORF could be spliced in

many ways • E.g. case of VDR genes of IgG• Given biomaRt object, the makeTranscript()will

extract splicing information for given id (i.e. gene)• Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl")

• Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome))

• Get splicing data from biomaRt object (hGenome)spliceObj = makeTranscript("ENSG00000168309",

"ensembl_gene_id" ,hGenome)

• Plot the object with gdPlotgdPlot(list(makeTitle("Transcript ID:

ENSG00000168309"),splicingObj, makeGenomeAxis()) )

24

Page 25: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

The final result

25

Page 26: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Conclusion

• GeneGraphs provides a wide range to plot genomic data• Can use external databases through biomaRt• Main useful features

• identifies exons/introns• allows to cross-reference expression / genome data• flexible albeit complex plotting capabilities• allows to overlay graphical objects and text• ability to create custom legends• annotation capabilities provided by powerful biomaRt

26

Page 27: Intro to Biocoductor and GeneGraph by Kyrylo Bessonov (kbessonov@ulg.ac.be) 9 Oct 2012.

Thank you for your patience!&

Happy Bioconductor/R Exploration!