Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated;...
Transcript of Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated;...
![Page 1: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/1.jpg)
11
Utah State University – Fall 2019Statistical Bioinformatics (Biomedical Big Data)
Notes 6
1
Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression
![Page 2: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/2.jpg)
2
References Anders & Huber (2010), “Differential Expression Analysis for
Sequence Count Data”, Genome Biology 11:R106 DESeq2 Bioconductor package vignette, obtained in R using
vignette("DESeq2")
Kvam, Liu, and Si (2012), “A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data”, Am. J. of Botany 99(2):248-256.
Love, Huber, and Sanders (2014), “Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2”, Genome Biology 15(12):550.
![Page 3: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/3.jpg)
3
Example – 3 treated vs. 4 untreated;read counts (RNA-Seq) for 14,470 genes
Published 2010 (Brooks et al., Genome Research)Drosophila melanogaster 3 samples “treated” by knock-down of “pasilla” gene
(thought to be involved in regulation of splicing)
T1 T2 T3 U1 U2 U3 U4FBgn0000003 0 1 1 0 0 0 0FBgn0000008 118 139 77 89 142 84 76FBgn0000014 0 10 0 1 1 0 0FBgn0000015 0 0 0 0 0 1 2FBgn0000017 4852 4853 3710 4640 7754 4026 3425FBgn0000018 572 497 322 552 663 272 321
![Page 4: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/4.jpg)
4
# load datalibrary(pasilla); data(pasillaGenes)library(DESeq)eset <- counts(pasillaGenes)colnames(eset) <- c('T1','T2','T3','U1','U2','U3','U4')head(eset)
![Page 5: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/5.jpg)
Consider per-gene tests t-test
Nonparametric Wilcoxon Rank Sum
5
Error in t.test.default(x = c(2L, 2L, 2L, 2L), y = c(1L, 1L, 1L)) : data are essentially constant
T1 T2 T3 U1 U2 U3 U4 1 1 1 2 2 2 2
![Page 6: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/6.jpg)
6
# try a per-gene t-testtrt <- c(1,1,1,0,0,0,0)pvals <- rep(NA,nrow(eset))for(i in 1:nrow(eset)){
x <- eset[i,]a1 <- t.test(x~trt)pvals[i] <- a1$p.value
}i # 1687eset[i,]#T1 T2 T3 U1 U2 U3 U4 # 1 1 1 2 2 2 2
# try a per-gene Wilcoxon rank sum test (allowing for ties)library(coin)pvals <- rep(NA,nrow(eset))for(i in 1:nrow(eset)) # This takes a few minutes{
x <- eset[i,]a1 <- wilcox_test(x~as.factor(trt))pvals[i] <- pvalue(a1)
}hist(pvals, main='Pvalues from Wilcoxon Rank Sum Test',
cex.main=2, cex.lab=1.5)
![Page 7: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/7.jpg)
7
Consider data as counts (Poisson regression) On a per-gene basis: Let Ni = # of total fragments counted in sample i Let pi = P{ fragment matches to gene in sample i }
Observed # of total reads for gene in sample i : Ri ~ Poisson(Nipi) E[Ri] = Var[Ri] = Nipi
Let Ti = indicator of trt. status (0/1) for sample iAssume log(pi) = β0 + β1 Ti
Test for DE using H0: β1 = 0
![Page 8: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/8.jpg)
8
Poisson Regression
E[Ri] = Nipi = Ni exp(β0 + β1 Ti)
log(E[Ri]) = log Ni + β0 + β1 Ti
Do this for one gene in R (here, gene 2):
estimate β’s using iterative MLE procedurenot interesting, but important – call this the “offset”;often considered the “exposure” for sample I(a quasi-normalization to scale overall genomic material)
trt <- c(1,1,1,0,0,0,0)R <- eset[2,]lExposure <- log(colSums(eset))a1 <- glm(R ~ trt, family=poisson, offset=lExposure)summary(a1)
![Page 9: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/9.jpg)
9
Call:glm(formula = R ~ trt, family = poisson, offset = lExposure)
Deviance Residuals: T1 T2 T3 U1 U2 U3 U4
0.3690 0.4516 -0.9047 -0.7217 0.5862 2.3048 -2.5286
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.85250 0.06804 -174.19 <2e-16 ***trt 0.05875 0.10304 0.57 0.569 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 14.053 on 6 degrees of freedomResidual deviance: 13.729 on 5 degrees of freedomAIC: 58.17
Number of Fisher Scoring iterations: 4
![Page 10: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/10.jpg)
10
Do this for all genes …
jackpot?
![Page 11: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/11.jpg)
11
Possible (frequent) problem – overdispersion
Recall [implicit] assumption for Poisson dist’n:E[Ri] = Var[Ri] = Nipi
It can sometimes happen that Var[Ri] > E[Ri]common check: add a scale (or dispersion)
parameter σVar[Ri] = σ E[Ri]Estimate σ2 as χ2/dfDeviance χ2 a goodness of fit statistic:
∑
⋅⋅=
i i
iiD R
RR ˆlog22χ
![Page 12: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/12.jpg)
12
# Poisson regression for all genes, checking for overdispersionPoisson.p <- scale <- rep(NA,nrow(eset))lExposure <- log(colSums(eset))trt <- c(1,1,1,0,0,0,0)
## this next part takes about 1.5 minutesprint(date()); for(i in 1:nrow(eset)){ count <- eset[i,]
a1 <- glm(count ~ trt, family=poisson, offset=lExposure)Poisson.p[i] <- summary(a1)$coeff[2,4]scale[i] <- sqrt(a1$deviance/a1$df.resid)
}; print(date())
par(mfrow=c(2,2))hist(Poisson.p, main='Poisson', xlab='raw P-value')boxplot(scale, main='Poisson', xlab='scale estimate');abline(h=1,lty=2)
mean(scale > 1)# 0.640152
![Page 13: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/13.jpg)
13
Can use alternative distribution: edgeR package does this:For each gene: Ri ~ NegativeBinomial (number of indep. Bernoulli trials to achieve a fixed
number of successes) Let μi = E[Ri] , and vi = Var[Ri] But low sample sizes prevent reliable estimation ofμi and vi
Assume vi = μi + α μi2
estimate α by pooling information across genes then only one parameter must be estimated for each gene
But – DESeq2 package improves on this
![Page 14: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/14.jpg)
14
Negative Binomial (NB) using DESeq2 …
Define trt. condition of sample i:
Define # of fragment reads in sample i for gene k:
Assumptions in estimating and :
( )2,~ kikiki NBR σµ
kiµ 2kiσ
iikki sq )(,ρµ =
)(iρ
)(,22
ikikiki vs ρµσ +=
( ))(,)(, ikik qvv ρρρ =
library size, prop. to coverage [exposure] in sample iper-gene abundance, prop. to true conc. of fragments
raw variance (biological variability)“shot noise” – this “dominates” for low-expressed genes
smooth function – pool information across genes to estimate variance
![Page 15: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/15.jpg)
15
Estimate parameters (for NB distn.)
denom. is geometric mean across samples like a pseudo-reference sample
is essentially equivalent to ,
with robustness against very large for some k
= ∏=
mm
jkjkiki RRmeds
/1
1
ˆ
m = # samples; n = # genes
is ∑k
kiR
kiR
For median calculation, skip genes where geometric mean (denom) is zero.
![Page 16: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/16.jpg)
16
= # samples in trt. condition
this is the mean of the standardized counts from the samples in treatment condition
Estimate parameters (for NB distn.)
∑=
=ρρρ
ρ)(: ˆ
1ˆii i
kik s
Rm
q
ρm ρ
ρ
![Page 17: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/17.jpg)
17
Estimate function wρ by plotting vs. , and use parametric dispersion-mean relation:
( is “asymptotic dispersion”; is “extra Poisson”)
Estimate parameters (for NB distn.)
( ) ρρρ αα kk qqw ˆ/ˆ 10 +=
(this is the variance of the standardized counts from the samples in trt. condition ρ)
(an un-biasing constant)
ρkw ρkq
( ) ( )( ) ρρρρρρ
ρρρ
ρρ
ρρρ
ρρ
kkkk
ii i
kk
iik
i
kik
zqwwqvsm
qz
qsR
mw
−=
⋅=
−
−=
∑
∑
=
=
ˆ,ˆmaxˆˆˆ1ˆ
ˆˆ1
1ˆ
)(:
2
)(:
0α 1α
![Page 18: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/18.jpg)
Estimating Dispersion in DESeq21. Estimate dispersion value for each gene
2. Fit for each condition (or pooled conditions [default]) a curve through estimates (in the vs. plot)
3. Assign to each gene a dispersion value, using the maximum of the estimated [empirical] value or the fitted value -- this conservative approach avoids under-estimating dispersion (which would increase false positives)
18
ρkw
ρkw
ρkw
ρkq
( )ρρ kqw ˆ
![Page 19: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/19.jpg)
Getting started with DESeq2 package Data in this format (previous slide 3) Integer counts in matrix form, with columns for samples and
rows for genes Row names correspond to genes (or genomic regions, at least) See package vignette for suggestions on how to get to this
format (including from sequence alignments and annotation)
Can use read.csv or read.table functions to read in text files
Each column is a biological rep If have technical reps, sum them together to get a single column
19
![Page 20: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/20.jpg)
20
# format datalibrary(DESeq2)countsTable <- eset # counts table needs
# gene IDs in row namesrownames(countsTable) <- rownames(eset)
dim(countsTable) # 14470 genes, 7 samples
conds <- c("T","T","T","U","U","U","U") # 3 treated, 4 untreated; put in data.frame:
cframe <- data.frame(conds)
# Fit DESeq model (after formatting object):dds <- DESeqDataSetFromMatrix(countsTable, colData=cframe,
design = ~ conds)ddsCtrst <- DESeq(dds)
# check quality of dispersion estimationpar(mfrow=c(1,1))plotDispEsts(ddsCtrst, cex.lab=1.5)
![Page 21: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/21.jpg)
Checking Quality of Dispersion Estimation Plot vs.
(both axes log-scale here)
Add fitted line for
Check that fitted line isroughly appropriate general trend
21
ρkw ρkq
( )ρρ kqw ˆ
![Page 22: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/22.jpg)
22
Test for DE between conditions
Based on contrasts (coming moreformally in Notes 7, slides 14-20)
![Page 23: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/23.jpg)
23
Peak near zero:DE genes
Peak nearer one:low-count genes (?)
Default adjustment: BH FDR (?)
log2 fold change (MLE): conds T vs UWald test p-value: conds T vs UDataFrame with 6 rows and 4 columns
baseMean log2FoldChange pvalue padj<numeric> <numeric> <numeric> <numeric>
FBgn0000003 0.1594687 0.95577724 0.80202750 NAFBgn0000008 52.2256776 0.02806414 0.92576489 0.9892560FBgn0000014 0.3897080 0.74861167 0.81899159 NAFBgn0000015 0.9053584 -0.81010553 0.67840751 NAFBgn0000017 2358.2434078 -0.27580756 0.03285053 0.2400995FBgn0000018 221.2415562 -0.11987673 0.50758039 0.8708435
![Page 24: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/24.jpg)
24
# test for DE (Wald test, z=est/se{est})res <- results(ddsCtrst, contrast=c("conds","T","U"))
# see results # (partial columns here just for convenience)head(res)[,c(1,2,5,6)]hist(res$pvalue,xlab='raw P-value', cex.lab=1.5, cex.main=2,
main='DESeq2, Wald test')
# check to explain missing p-values t <- is.na(res$pvalue)sum(t) # 2638, or about 18.2% hereboxplot(res$baseMean[t], cex=2, pch=16)# -- almost always, only happens# for undetected genes
# define sig DE genespadj <- p.adjust(res$pvalue, "fdr")t <- padj < .05 & !is.na(padj)gn.sig <- rownames(res)[t]length(gn.sig) # 561
![Page 25: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/25.jpg)
25
# check p-value peak nearer 1counts <- rowMeans(eset)t <- res$pvalue > 0.8 & !is.na(res$pvalue)par(mfrow=c(2,2))hist(log(counts[t]), xlab='[logged] mean count',
main='Genes with largest p-values')hist(log(counts[!t]), xlab='[logged] mean count',
main='Genes with NOT largest p-values')# -- tends to be genes with smaller overall counts
![Page 26: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/26.jpg)
26
Same example, but with extra covariate 3 samples “treated” by knock-down of “pasilla”
gene, 4 samples “untreated”Of 3 “treated” samples, 1 was “single-read” and 2
were “paired-end” typesOf 4 “untreated” samples, 2 were “single-read” and 2
were “paired-end” types
TS1 TP1 TP2 US1 US2 UP1 UP2FBgn0000003 0 1 1 0 0 0 0FBgn0000008 118 139 77 89 142 84 76FBgn0000014 0 10 0 1 1 0 0FBgn0000015 0 0 0 0 0 1 2FBgn0000017 4852 4853 3710 4640 7754 4026 3425FBgn0000018 572 497 322 552 663 272 321
![Page 27: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/27.jpg)
27
![Page 28: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/28.jpg)
28
# load data; recall eset object from previous slidescolnames(eset) <- c('TS1','TP1','TP2','US1','US2','UP1','UP2')head(eset)
# format data and fit modelcountsTable <- esetrownames(countsTable) <- rownames(eset)trt <- c("T","T","T","U","U","U","U") type <- c("S","P","P","S","S","P","P")cframe <- data.frame(trt, type)dds <- DESeqDataSetFromMatrix(countsTable, colData=cframe,
design = ~ trt + type)ddsCtrst <- DESeq(dds)res <- results(ddsCtrst, contrast=c("trt","T","U"))pvals <- res$pvalue
# Visualize sig. resultspar(mfrow=c(1,1))hist(pvals, xlab='Raw p-value', cex.lab=1.5, cex.main=2,
main='Test trt effect while accounting for type')
![Page 29: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/29.jpg)
29
# Visualize sig. resultshist(pvals, xlab='Raw p-value', cex.lab=1.5, cex.main=2,
main='Test trt effect while accounting for type')
# Get sig. genesadj.pvals <- p.adjust(pvals, "BH")t <- adj.pvals < .05 & !is.na(adj.pvals)sum(t) # 708sig.gn <- rownames(eset)[t]
# Visualize sig. geneslibrary(RColorBrewer)small.eset <- eset[t,]hmcol <- colorRampPalette(brewer.pal(9,"Reds"))(256)csc <- rep(hmcol[250],ncol(small.eset))csc[trt=="U"] <- hmcol[10]heatmap(small.eset,scale="row",col=hmcol,
ColSideColors=csc, cexCol=2.5,main=paste(sum(t),'Sig. Genes'))
![Page 30: Testing High-Dimensional Count (RNA-Seq) Data for ... · 3 Example – 3 treated vs. 4 untreated; read counts (RNA-Seq) for 14,470 genes Published 2010 (Brooks et al., Genome Research)](https://reader035.fdocuments.in/reader035/viewer/2022071006/5fc3a42edd446d3ca933c315/html5/thumbnails/30.jpg)
SummaryTest count (RNA-Seq) data using Negative
Binomial distribution (DESeq2 approach, using contrasts), pooling information across genes
What next?Adjust for multiple testingFiltering (to increase statistical power) zero-count genes?
Visualization: Heatmaps / clustering / PCA biplot / othersCharacterize significant genes (annotations)30