Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray...
-
Upload
philip-fields -
Category
Documents
-
view
214 -
download
1
Transcript of Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray...
Statistical Methods for Identifying Differentially Expressed Genes in
Replicated cDNA Microarray Experiments
Presented by Nan Lin
13 October 2002
Introduction to cDNA Microarray Experiment
Single-slide Design– Two mRNA samples (red/green) on the same slide
Multiple-slide Design– Two or more types of mRNA on different slides– Exclude: time-course experiment
Examples of Multiple-slide Design
Apo AI– Treatment group: 8 mice with apo AI gene knocked out– Control group: 8 C57B1/6 mice– Cy5: each of 16 mice– Cy3: pooling cDNA from 8 control mice
SR-BI– Treatment group: 8 SR-BI transgenic mice– Control group: 8 “normal” FVB mice
Microarray Setup– 6384 spots, 4X4 grids with 19X21 spots in each
Single-slide Methods
Two types– Based solely on intensity ratio R/G– Take into account overall transcript abundance measured by
R*G
Historical Review– Fold increase/decrease cut-offs (1995-1996)– Probabilistic modeling based on distributional assumptions
(1997-2000)– Consider R*G (2000-2001) e.g. Gamma-Gamma-Bernoulli
Summary of Single-slide Methods
Producing a model dependent rule: drawing two curves in the (R,G) plane
– Power (1-Type II error rate)– False positive rate (Type I error rate)
Multiple testing
Replication is needed because gene expression data are too noisy
Image Analysis
“Raw” data: 16-bit TIFF files Addressing
– Within a batch, important characteristics are similar Segmentation
– Seeded region growing algorithm Background adjustment
– Morphological opening (a nonlinear filter) Software package: Spot in R environment
Single-slide Data Display
Plot log2R vs. log2G– variation less dependent on absolute magnitude– normalization is additive for logged intensities– evens out highly skewed distributions– a more realistic sense of variation
Plot M=log2 (R/G) vs. A=[log2(RG)]/2– More revealing in terms of identifying spot artifacts
and for normalization purpose
Normalization
Identify and remove sources of systematic variation other than differential expression
– Different labeling efficiencies and scanning properties for Cy3 and Cy5
– Different scanning parameters– Print-tip, spatial or plate effects
Red intensity is often lower than green intensity The imbalance between R and G varies
– across spots and between arrays– Overall spot intensity A– Location on the array, plate origin, etc.
An Example: Self-Self Experiment
Normalization (Cont.)
Global normalization– subtract mean or median from all intensity log-ratios
More complex normalization– Robust locally weighted regression
M=spot intensity A+location+plate origin Use print-tip group to represent the spot locations log2 (R/G) log2 (R/G) –l(A,j) l(A,j): lowess in R (0.2<f<0.4)
Control sequences
Apo AI: Normalization
Graphical Display for Test Statistics (I)
Test statistics– Hj: no association between treatment and the
expression level of gene j, j=1,…,m.– Two-sided alternative– Two-sample Welch t-statistics– Replication is essential to assess the variability in
treatment and control group– The joint distribution is estimated by a permutation
procedure because the actual distribution is not a t-distribution
Graphical Display for Test Statistics (II)
Quantile-Quantile plots
Graphical Display for Test Statistics (III)
Plots vs. absolute expression levels
Multiple Hypothesis Testing: Adjusted p-values (I)
P-value: Pj=Pr(|Tj|>=|tj||Hj), j=1,…,m. Family-wise Type I Error Rate (FWER)
– The probability of at least one Type I error in the family
Strong Control of the FWER– Control the FWER for any combination of true and false
hypotheses
Weak Control of the FWER– Control the FWER only under the complete null hypothesis
that all hypotheses in the family are true
Multiple Hypothesis Testing: Adjusted p-values (II)
Adjusted p-value for Hj
– Pj=inf{a: Hj is rejected at FWER=a}
– Hj is rejected at FWER a if Pj<=a
P-value adjustment approaches– Bonferroni – Sidak single-step– Holm step-down– Westfall and Young step-down minP
Multiple Hypothesis Testing: Estimation of adjusted p-values (I)
Multiple Hypothesis Testing: Estimation of adjusted p-values (II)
Apo AI: Adjusted p-values (I)
Apo AI: Adjusted p-values (II)
Apo AI: Comparison with Single-slide Methods
Discussion
M-A plots Normalization
– Robust local regression, e.g. lowess Q-Q plots & Plots vs. absolute expression level False discovery rate (FDR) Replication is necessary Design issues Factorial experiments Joint behavior of genes R package SMA