Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui...
-
Upload
dane-grimmett -
Category
Documents
-
view
222 -
download
3
Transcript of Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui...
![Page 1: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/1.jpg)
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA
Xiangqin Cui and Gary A. ChurchillGenome Biology 2003, 4:210
Presented by M. Carme Ruíz de Villa and Alex
SánchezDepartament d’Estadística U.B.
![Page 2: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/2.jpg)
Introduction
![Page 3: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/3.jpg)
Remember …
We want to measure how gene expression changes under different conditions. Only two conditions and an adequate number
of replicates t-tests & extensions More than two conditions / more than one
factor: several approaches Analysis of Variance (ANOVA) (Churchill et al.) Linear Models (Smyth, Speed, …)
![Page 4: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/4.jpg)
Sources of variation (1) We want to determine when the
variation due to gene expression is significant, but…
There are multiple sources of variation in measurements besides just gene expression.
We want to know when the variation in measurements is caused by varying levels of gene expression versus other factors.
![Page 5: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/5.jpg)
Sources of variation (2) Some sources of variation in the
measurements in microarray experiments are: Array effects Dye effects Variety effects Gene effects Combinations
![Page 6: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/6.jpg)
Relative expression values If more than two conditions
we cannot simply compute ratios ANOVA modelling yields estimates of the
relative expression for each gene in each sample
The ANOVA model is not based on log ratios. Rather it is applied directly to intensity data. However the difference between two relative expression values can be interpreted as the mean log ratio for comparing two samples.
![Page 7: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/7.jpg)
Technical & biological replicates
If inference is being made on the basis of biological replicates
and there is also technical replication
technical replicates should be averaged to yield a single value for each independent biological unit.
![Page 8: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/8.jpg)
Review of ANOVA models
![Page 9: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/9.jpg)
One way ANOVA Suppose you have a model for each
measurement in your experiment:
yij is jth measurement for ith group. μ : overall mean effect (constant) αi : ith group effect (constant) εij : experimental error term ~N(0,σ2)
Therefore, observations from group i are distributed with mean μ+ αi and variance σ2 .
ijiijy
![Page 10: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/10.jpg)
Hypothesis Testing
)()( .. yyyyyy iiijij
Overall variability Within group variability
Between group variability
Intuition: if between group variability is large compared to within group variability then the differences between means is significant.
![Page 11: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/11.jpg)
Sum of Squares
2
1 1
2.
1 1
2.
1 1
( )
( )
( )
i
i
i
nk
iji j
nk
ij ii j
nk
ii j
y y
y y
y y
Total sum of squares
Within Sum of Squares
Between Sum of Squares
![Page 12: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/12.jpg)
Mean Sum of Squares
Between MS = Between SS/(k-1) Within MS = Within SS/(n-k) F = Between MS / Within SS It is summarized in the ANOVA
table Example 1
![Page 13: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/13.jpg)
Multiple Factor ANOVA
ijk i j ij ijky
The model can be extended by adding more Factors (, , …) Interactions between them (, …) Other …
This is used to model the different sources of variation appearing in microarray experiments
![Page 14: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/14.jpg)
Experiment 1: Latin Square
liver muscle muscleliver
![Page 15: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/15.jpg)
Random effects models If the k factor levels can be considered a
random sample of a population of factors we have a random effect
ANOVA model: Yij= + Ai + eij, overall mean, Ai is a random variable instead of a
constanty, eij experimental error. E(Ai)=0, E(eij)=0, var(Ai)=A
2, var(eij) =2, Ai i eij independent var(Yij)= A
2 + 2.
![Page 16: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/16.jpg)
Where to find more…
Draghici, S. (2003). ANOVA chapter (7) Data analysis tools for microarrays Wiley
Pavlidis, P. (2003) Using ANOVA for gene selection from microarray studies of the nervous systemhttp://microarray.cpmc.columbia.edu/pavlidis/ doc/reprints/anova-methods.pdf
![Page 17: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/17.jpg)
ANOVA Models for Microarray Data
![Page 18: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/18.jpg)
Kerr & Churchill’s model
• yijkg expression measurement from the ith array, jth dye, kth variety, and gth gene.
• μ average expression over all spots.• Ai effect of the ith array. • Dj effect of the jth dye.• Vk effect of the kth variety (=treatment, sample, …)• Gg effect of the gth gene.• (AG)ig effect of the ith array and gth gene.• (VG)kg effect of the kth variety and gth gene.• Єijkg independent and identically distributed error terms.
![Page 19: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/19.jpg)
Interpreting main effects A: differences in fluorescent signal from
array to array (e.g. if arrays are probed under inconsistent conditions that increase or reduce hybridization of labeled cDNA)
D: differences between two dye fluorescent labels (one dye may consistently be brighter than the other)
G: differences in fluorescence for equally expressed genes.
V: differences of expression level between different varieties (samples, tumour types,..).
![Page 20: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/20.jpg)
Interpreting interactions DV: If for a particular variety labelling is
produced in separate runs of the process Differences in the runs can produce pools of cDNA of varying concentrations or quality.
AG: (Spot effect) Spots for a given gene on the different arrays vary in the amount of cDNA available for hybridization.
DG: if there are differences in the dyes that are gene-specific
VG: reflects differences in expression for particular variety and gene combinations that are not explained by the average effects of these varieties and genes.THIS IS THE QUANTITY OF INTEREST !!!
![Page 21: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/21.jpg)
Normalization A,D,V terms effectively normalize the data,
thus the normalization process is integrated with the data analysis.
This approach has several benefits (?) The normalization is based on a clearly stated set
of assumptions It systematically estimates normalization
parameters based on all the data The model can be generalized to the situation
where genes are spotted multiple times on each array rather
![Page 22: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/22.jpg)
Statistically Significant Effects
Array, Dye , Variety & Gene effect Goal: To estimate their value. Need not assess their significance Sometimes don’t appear (gene-level model)
Array x Gene, Variety x Gene effects May or not be present Goal: To assess their significance
Mean effect = 0 if fixed Effect variance = 0 if random
![Page 23: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/23.jpg)
Test statistics: The 3 F’s Hypothesis testing involves
the comparison of two models.
In this setting we consider a null model of no differential
expression (all VG =0) and an alternative model with
differential expression among the conditions (some VG are not equal to zero).
F statistics are computed on a gene-by-gene basis based on the residual sums of squares from fitting each of these models.
11
1010
/
)/()(1
dfrss
dfdfrssrssF
21010 )/()(
3pool
dfdfrssrssF
2/)/(
)/()(2
211
1010
pooldfrss
dfdfrssrssF
![Page 24: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/24.jpg)
Example 1 A gene, which is believed to be related to ovarian cancer is investigated The cancer is sub-classified in 3 cathegories (stages) I, II, III-IV 15 samples, 3 per stage are available They are labelled with 3 colors and hybridized on a 4 channel cDNA array (1
channel empty)(A seemingly more reasonable procedure: double dye-swap reference design)
![Page 25: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/25.jpg)
Example 1. Normalized Data
Array I II III1 100 95 702 90 93 723 105 79 814 83 85 745 78 90 75
![Page 26: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/26.jpg)
Example 1: ANOVA table (1)
ANOVA
EXPRLEVE
810.133 2 405.067 6.505 .012
747.200 12 62.267
1557.333 14
Inter-grupos
Intra-grupos
Total
Suma decuadrados gl
Mediacuadrática F Sig.
555N =
CANCTYP
321
EX
PR
LE
VE
110
100
90
80
70
60
13
If arrays are homogeneousThe appropriate model is 1 factor ANOVA
![Page 27: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/27.jpg)
Example (1): Blocking
Origen de las variaciones
Suma de cuadrados
Grados de libertad
Promedio de los cuadrados F Probabilidad
Valor crítico para F
Filas 18.4 4 4.6 0.08440367 0.98276859 6.38823394Columnas 490 1 490 8.99082569 0.04000226 7.70864972Error 218 4 54.5
Total 726.4 9
If arrays are not homogeneous the appropriate model is 2 factor ANOVA (1 new block factor for arrays)
![Page 28: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/28.jpg)
Example 2: CAMDA kidney dataftp://ftp.camda.duke.edu/CAMDA02_DATASETS/papers/README_normal.html
6 mouse kidney samples
(suppose 6 different treatments)
Compared to a common reference in a double reference design Dye swap Replicate arrays
2
![Page 29: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/29.jpg)
2.1. The ANOVA model
Work only at the gene level: no main effects (A, D, V, G) as defined
Yijk=DGi+AGj+VGk+ijk i=1,2 (dyes) j=1,2; (array) K=1,…,6 (sample)
![Page 30: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/30.jpg)
Example 3: A 2 factor design Diet X Strain
![Page 31: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/31.jpg)
3.2. Design
![Page 32: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/32.jpg)
3.3. The ANOVA model
Yijk=DGi+AGj+Strainl+Dietm+ Strain:Dietlm+ VGk+ijklm
i=1,…,2 (dyes) j=1,…,2; (array) k=1,…,12 (sample) l =1,…,3 (strain) m = 1,...,2 (diet)
![Page 33: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/33.jpg)
3.4 Sample R code (1) data(paigen) paigen <- createData(rawdata, 2) model.full.fix <- makeModel (data =
paigen,formula=~DG+AG+SG+ Strain+Diet+Strain:Diet)
anova.full.fix <-fitmaanova (paigen, model.full.fix)
model.noint.fix <- makeModel (data =
paigen,ormula=~DG+AG+SG+Strain+Diet) anova.noint.fix <- fitmaanova(paigen,
model.noint.fix)
![Page 34: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented.](https://reader030.fdocuments.in/reader030/viewer/2022013004/56649cae5503460f94971e22/html5/thumbnails/34.jpg)
3.4. Sample R code (2) # permutation tests – # test for interaction effect test.int.fix <- ftest(paigen,
model.full.fix, model.noint.fix, n.perm=500)
idx.int.fix <- volcano(anova.full.fix, test.int.fix, title="Int. test")