Post on 06-Aug-2020
__________________________________________________________________________________________________
Fall 2015 GCBA 815
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Tools and Algorithms in Bioinformatics
GCBA815, Fall 2015
Week9: Creating Heat Maps and
Circos Diagrams
You Li, PhD Candidate (Guda lab)
Department of Genetics, Cell Biology and Anatomy
University of Nebraska Medical Center
__________________________________________________________________________________________________
Fall 2015 GCBA 815
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Outline
• Heatmap using R for
• Microarray
• RNA-seq
• Genome wide Circular plot (Circos)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Heatmap
A heat map is a graphical representation
of data where the individual values
contained in a matrix are represented as
colors. (wikipedia)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Heatmap
• Widely used for representing gene expression
• Patterns are more observable in heatmap,
comparing to pure numeric matrix
Heatmap (Left) vs. Matrix (Bottom)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Create Heatmaps using R
• R is a language and environment for
statistical computing and graphics. (www.r-
project.org)
• Windows, Linux, and Mac version
• Easy to learn
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Create Heatmaps using R
• Download and install R
• ISU mirror download site http://goo.gl/gn3PJf
__________________________________________________________________________________________________
Fall 2015 GCBA 815
R basics
• R commands basics
• Change current working directory
• Note: in R, "C:\Users\name\" will not be interpreted
as it is because "\" is a reserved special character.
We need to use "\\" to tell R it is a directory
delimiter. (Or use "/" instead)
• Note: Commands are case-sensitive
> setwd("X:\\path\\to\\your\\workdir\\") > getwd()
__________________________________________________________________________________________________
Fall 2015 GCBA 815
R basics
• Variables, assignment, and calculation
• Arrays and matrices
> a <- 100 # assign 100 to variable ‘a’ > a # print ‘a’ > a+2 # calculate ‘a+2’ > a <- log(a, 10) # assign the value ‘log10(a)’ back to ‘a’
> c(1,2,3,4,5) # 1D array of five elements > seq(1,5) # same as the previous one > seq(1, 10, 2) # seq(from, to, by) > rnorm(6) # 6 random variables from normal
# distribution > matrix(rnorm(20), 5) # Random 2D matrix with 5 rows and
# 20 elements
__________________________________________________________________________________________________
Fall 2015 GCBA 815
R basics
• Getting help
• Get the manual of a command using
help(command) or ?command
> help(rnorm) > ?rnorm
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Create Heatmaps using R
• Generate a simple heatmap for a random matrix
> rmat <- matrix(rnorm(100), 20)
> heatmap(rmat)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Create Heatmaps using R
• Heatmap for Microarray
• Expression values are represented using Fold Change
(log2 transformed). Can be negative. Relative
expression value with respect to control.
• Symmetric at value zero, under-express and over-
express
Under-expressed Over-expressed
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Create Heatmaps using R
• Heatmap for RNA-seq
• Expression values are represented using normalized read
counts (non-negative value). Absolute expression value.
• Grows exponentially (need to use log10 transformation for
heatmap)
y=log10(x+1)
High-expression
Low-expression
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Preparation
• Install and load gplots module
• Create C:\Users\NAME\graph_workshop dir
• Download test files from https://goo.gl/tsvd82
and unzip to this directory
> install.packages( pkgs= "gplots" )
> library("gplots")
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Change working directory
> setwd("C:/Users/NAME/graph_workshop") > getwd() # should print the workshop directory [1] "C:/Users/NAME/graph_workshop" > dir() [1] "rdata_microarray.txt" "rdata_rnaseq.txt"
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Read file
• Explanation: Read table from "rdata_microarray.txt" with tab delimiter.
Header is presented. First column will be the row names. The output
will be assigned to the variable maData.
> maData <- read.table("rdata_microarray.txt", sep="\t",
header=TRUE, row.names=1)
> maData # see what is in maData variable
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Change the data into a numeric matrix
> maData <- as.matrix(maData)
> class(maData) <- "numeric"
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Fast heatmap
> heatmap.2(maData)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Tune heatmap
• Change color
• Rescale color
• Remove Histogram
• Smaller gene names
• Add group information
• Remove horizontal clustering
• …
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Change color
> heatmap.2(maData, col=greenred(10))
> heatmap.2(maData, col=greenred(100))
> heatmap.2(maData, col=colorRampPalette(c("blue", "black",
"yellow"))(100))
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Rescale color
> heatmap.2(maData, col=greenred(100), scale="none") #default
> heatmap.2(maData, col=greenred(100), scale="row")
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101))
Preferred
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Remove histogram
• trace="none" removes histogram in heatmap
• density.info="none" removes histogram in
color key scale ruler
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101), trace="none", density.info="none")
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Text label size
• cexRow and cexCol defines the font size for
row (gene name) and column (sample name)
• Default is 1
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101), trace="none", density.info="none",
cexRow=0.5, cexCol=1)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Add group information
• Assume G1: S1-S3; G2: S4-S6; G3: S7-S9.
• Plot heatmap
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101), trace="none", density.info="none",
cexRow=0.5, cexCol=1, ColSideColors=clab)
> clab <- matrix(c(replicate(3, "green"), replicate(3, "blue"), replicate(3,
"red")))
> clab # check how clab looks like
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Remove horizontal clustering
• Colv=FALSE disables the clustering on samples
• Rowv=FALSE will disable the clustering on genes
• Only show dendrogram on "row". Value can be
"row", "col", and "both".
> heatmap.2(maData, col=greenred(100), scale="none",
breaks = seq(-4, 4, length.out = 101), trace="none",
density.info="none", cexRow=0.5, cexCol=1,
ColSideColors=clab, Colv=FALSE, dendrogram="row")
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Remove color key
• Sometimes, we want to combine multiple
heatmaps together in one figure. Therefore,
we only need to keep one color key because
the scales are all the same.
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101), trace="none", density.info="none",
cexRow=0.5, cexCol=1, ColSideColors=clab, Colv=FALSE,
dendrogram="row", key=FALSE)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Final heatmap
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Figure requirement (general)
• Format (tiff or JPEG or …)
• 300-600 dpi
• <10 MB
• Dimension
• Background
• Compression mode
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• Initiate tiff device in R
• Plot the heatmap again (this time in tiff device)
• Turn off device
> tiff(filename = "mArrayFigure.tiff", width = 4, height = 6, units = "in",
compression = "lzw", bg = "white", res = 600)
> heatmap.2(maData, col=greenred(100), scale="none", breaks =
seq(-4, 4, length.out = 101), trace="none", density.info="none",
cexRow=0.5, cexCol=1, ColSideColors=clab, Colv=FALSE,
dendrogram="row", key=FALSE)
> dev.off()
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Microarray Heatmap
• mArray_figure.tiff
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Prepare data
> rsData <- read.table("rdata_rnaseq.txt", sep="\t",
header=TRUE, row.names=1)
> rsData <- as.matrix(rsData)
> class(rsData) <- "numeric"
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Simple heatmap on rsData
• Simple heatmap on log10 transformed data
> heatmap.2(rsData)
> heatmap.2(log10(rsData+1))
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Product version
> heatmap.2(log10(rsData+1), col=rev(heat.colors(100)),
scale="none", breaks = seq(0, 2, length.out = 101),
trace="none", density.info="none", cexRow=0.5, cexCol=1,
Colv=FALSE, dendrogram="row")
> tiff(filename = "rnaSeq_Figure.tiff", width = 4, height = 6,
units = "in", compression = "lzw", bg = "white", res = 600)
> dev.off()
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• rnaSeq_Figure.tiff
• Weird color key position
• Gene names overlapped
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Heatmap layout
4
Color
Key
3
Column
dendrogram
2
Row
Dgram
1
Main heatmap
> ?layout
Default lmat (layout metrix)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Heatmap layout
3 4
Color Key
2
Row
Dgram
1
Main heatmap
> lmat = rbind(c(3,4),c(2,1))
> lmat # check lmat
New lmat (layout metrix)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Do it again
> heatmap.2(log10(rsData+1), col=rev(heat.colors(100)),
scale="none", breaks = seq(0, 2, length.out = 101),
trace="none", density.info="none", cexRow=0.3, cexCol=1,
Colv=FALSE, dendrogram="row", lmat=lmat)
> tiff(filename = "rnaSeq_Figure.tiff", width = 4, height = 6,
units = "in", compression = "lzw", bg = "white", res = 600)
> dev.off()
__________________________________________________________________________________________________
Fall 2015 GCBA 815
RNA-seq Heatmap
• Genuine product version
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Summary (heatmap)
• Lots of parameters, many of which are not
covered in this slides
• Cell size?
• Cell boarder?
• Row side color (mark differernt gene
clusters)
• …
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Summary (heatmap)
• Always ask questions by using
• Or ask Google
• Or UNMC bioinformatics group @GUDALAB
> ?heatmap.2
> ?tiff
> ?col
> ?log
> ?layout
> ?...
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Break
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos
• Free software package for data visualization
• Circular layout
• Informative
• Attractive
• Need to use Perl to run Circos
• Website: circos.ca
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (examples)
• Published Circos figure
• Ref: PLoS One 8:e72182.
• Chromosomal translocation
• Circular layout are more
advantageous
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (examples)
• Linear or layout has limitations
http://www.nature.com/nrc/journal/v13/n7/full/nrc3537.html
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (examples)
• Linear or layout has limitations
Circos website
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (examples)
• Linear or layout has limitations
Circos website
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Install Circos
• Download strawberry Perl windows installer
at http://goo.gl/U8GnzW
• Install perl
• Download Circos from http://goo.gl/quso0S
• Unzip circos-0.68.tgz package to
C:\Users\NAME\
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Install Circos
• Verify Perl installation
• Click search and run "cmd.exe",
type the following command in the
command prompt
• It should print out the perl version, along
with a short description of perl.
> perl -v
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Install Circos
• Check Circos
> cd C:\Users\NAME\circos-0.68
> perl bin/circos -modules # list all modules
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Install Circos
• Check Circos (Install required perl
modules)
> cpan Config::General # install perl module using cpan
> cpan Font::TTF:Font
> cpan Math::Bezier
> cpan Math::VecStat
> cpan Regex::Common
> cpan Set::IntSpan
> cpan Statistics::Basic
> cpan Text::Format
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Install Circos
• Check Circos, again
> perl bin/circos -modules # list all modules
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos
• Check Circos (example)
• Check the figure myfirstCircos.png under
workshop folder
> mkdir workshop # create a folder for this class
> perl bin/circos -conf example/etc/circos.conf -outputdir workshop -
outputfile myfirstCircos.png
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos
• How Circos works?
• A configuration file is good enough to run
circos
> perl bin/circos -conf example/etc/circos.conf -outputdir workshop -
outputfile myfirstCircos.png
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (simpler example)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (simpler example)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos (simpler example)
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos
• Create first.conf under workshop directory
• Copy and paste the code in the previous
slides into the first.conf file
• Run
• Check first.png under workshop directory
> perl bin/circos -conf workshop/first.conf -outputdir workshop -outputfile
first.png
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Practice (5mins)
Try to change these parameters to see how
the figure reacts accordingly
REMEMBER to change the -outputfile
parameter as well
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Another example
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Add more details
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Add more details
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Add more details
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Explain
• Run
• Check third.png under workshop directory
> perl bin/circos -conf workshop/third.conf -outputdir workshop -outputfile
third.png
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Explain
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Explain
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Explain
__________________________________________________________________________________________________
Fall 2015 GCBA 815
Circos
• Complex Circos figures can be created using
multiple configuration files, where each
configuration file defines the different layer, style, or
figure type such as heatmap, box plot, etc.
• Learn more Circos at http://goo.gl/oeeKBw
(full link: http://circos.ca/documentation/tutorials)
• Circos examples in this ppt are modified from Circos
tutorial at http://goo.gl/oeeKBw