Models and methods for summarizing GeneChip probe set data
description
Transcript of Models and methods for summarizing GeneChip probe set data
![Page 1: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/1.jpg)
1
Models and methods for summarizing
GeneChip probe set data
![Page 2: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/2.jpg)
2
Some Gene Expression Analysis Tasks
• Detection of gene expression – presence calls.
• Differential expression detection – comparative calls.
• Measurement of gene expression.
![Page 3: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/3.jpg)
3
Objective:
To compute probe set summaries which are good indicators of gene expression from background corrected, normalized, prefect match probe intensities for a set of arrays:
PM*ijk i=1,…,I, J=1,…,J, k=1,…K
Where i denotes probes in probe sets, j denotes arrays and k denotes probe sets.
![Page 4: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/4.jpg)
4
Affymetrix spike-in data set used for illustration - 14 genes spiked in at differentconcentrations into a common pool ofpancreas cRNA
A B C D E F G H I J K L M,N,O,P Q,R,S,T37777_at 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024684_at 0.25 0.5 1 2 4 8 16 32 64 128 256 512 1024 01597_at 0.5 1 2 4 8 16 32 64 128 256 512 1024 0 0.2538734_at 1 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.539058_at 2 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 136311_at 4 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 236889_at 8 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 41024_at 16 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 836202_at 32 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 1636085_at 64 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 3240322_at 128 256 512 1024 0 0.25 0.5 1 2 4 8 16 32 64407_at 0 0.25 0.5 1 2 4 8 16 32 64 128 256 512 10241091_at 512 1024 0 0.25 0.5 1 2 4 8 16 32 64 128 2561708_at 1024 0 0.3 0.5 1 2 4 8 16 32 64 128 256 512
![Page 5: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/5.jpg)
5
Affy comment on non-responding probes:
• Affymetrix: “Certain probe pairs for 407_at and 36889_at do not work well. It is recommended that these two probe sets be excluded for final statistical tally.”
![Page 6: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/6.jpg)
6
To log or not to log?
In addition to providing good expression values, we would like the model to be easy to understand and analyse – Would like to fit a standard linear model:
• Homogeneity of variance
• Additivity
• Normality
![Page 7: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/7.jpg)
7
Homogeneity of variance
Look at association between the variance and the mean of the intensities – plot IQR of PM* across 59 replicates against the median of PM* across 59 replicates for probe sets spanning the range of intensities.
Repeat for log2(PM*).
![Page 8: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/8.jpg)
8
Intensity scale – ALL
![Page 9: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/9.jpg)
9
Log Intensity scale – ALL
![Page 10: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/10.jpg)
10
Additivity
Look at log-log plots of PM* vs concentrations for 14 spike-in fragments
![Page 11: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/11.jpg)
11
PM.bgc.norm vs Conc log-log plot grp 1
![Page 12: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/12.jpg)
12
PM.bgc.norm vs Conc log-log plot grp 2
![Page 13: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/13.jpg)
13
Suggested additive model
Log-log plots of PM* vs concentrations suggest the following model:
log2(PM*ij) = pi + cj + ij (1)
With pi a probe affinity effect, cj the log2 scale expression level for chip j, and ij an iid error term.
For identifiability we fit with constraint
i pi=0.
![Page 14: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/14.jpg)
14
Normality
We can examine residuals from a least squares fit to model specified in (1) to verify the adequacy of the model in terms of additivity of effects and stability of variance.
The shape of the distribution of the residuals can also be compared with a Gaussian distribution to see how far off we are from this ideal.
![Page 15: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/15.jpg)
15
Res vs chip effects - grp 1
![Page 16: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/16.jpg)
16
Res vs chip effects - grp 2
![Page 17: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/17.jpg)
17
Figure – qqnorm residuals form additive fit
![Page 18: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/18.jpg)
18
Res qqnorm - grp 1
![Page 19: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/19.jpg)
19
Res qqnorm - grp 2
![Page 20: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/20.jpg)
20
Analyzing the untransformed PM* values
On the untransformed scale, one can fit a multiplicative model (Li-Wong):
PM*ij = i·j + ij (2)
The model is fitted by least squares by iteratively fitting the s and the s, regarding the other set as known. Fitting steps are interleaved with diagnostic checks used to exclude points from subsequent fits.
![Page 21: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/21.jpg)
21
PM vs logConc - grp 1
![Page 22: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/22.jpg)
22
Res vs chip effects - grp 1
![Page 23: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/23.jpg)
23
qq - grp 1
![Page 24: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/24.jpg)
24
Why robust?
• Bad probes – probe outliers
• Bad chips – chip outliers
• Image artifacts – individual outliers
We would like a fitting procedure which yields good estimates in the presence of various types of outliers – individual points, probes, and chips.
![Page 25: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/25.jpg)
25
Robust estimation
• Huber, Hampel, Rousseeuw
• Gross errors, round off errors, wrong model.
• Distinction between approach based on identification and exclusion of outliers and the modeling approach.
![Page 26: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/26.jpg)
26
M estimators
• A general class of robust estimators are obtained as solutions to:
min i(Yi-Xi)
Where is a symmetric function. Or solving the following system:
i(Yi-Xi)· Xi=0
![Page 27: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/27.jpg)
27
Robust fit by IRLS for each probe set
Starting with robust fit, at each iteration:
S = mad(rij)·c – robust estimate of scale of
uij = rij/S – rescaled residuals
wij =(|uij|)/|uij| – weights used in next LS fit.
Theoretical considerations can lead to specification of . In practice, one selects function with desirable characteristics.
![Page 28: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/28.jpg)
28
Example functions
![Page 29: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/29.jpg)
29
Options for fitting models to probe sets
Recall model
log2(PM*ij) = pi + cj + ij (1)
Can fit by:• Least squares• Least absolute deviation ((x)=|x|)• IRLS using various functions
• Can also get a single chip robust probe set summary.
![Page 30: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/30.jpg)
30
Robust fit example A
![Page 31: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/31.jpg)
31
Actual vs fitted
![Page 32: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/32.jpg)
32
Starting weights
![Page 33: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/33.jpg)
33
Ending weights
![Page 34: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/34.jpg)
34
Robust analysis of multi way tables
• Tukey & co – median polish.
• Tukey – one degree of freedom test – additivity or not – no partial judgement.
• Gentleman and Wilks – effect of one or two outliers on residuals.
• C. Daniel** – estimate which cells are affected by interactions, and estimate the interactions in a set of cells.
![Page 35: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/35.jpg)
35
Modified weights
Standard IRLS procedures determines weights from each cell of the two way table individually.
We can also look at residuals across cells in a row (column), to determine a weighting adjustment for the entire row (column):
rwi =(|u|i•)/|u|i• , cwj =(|u|•j)/|u|•j
And get a composite weight for each cell”
wwij = rwi · wij
wwwij = cwj · rwi · wij
![Page 36: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/36.jpg)
36
Heuristic derivation of weights
Consider the model with interactions:
log2(PM*ij) = pi + cj + ij + ij
Can think of Tij = |rij|/S as a test statistic for
H0: ij = 0 vs H1: ij 0
and wij = (Tij)/ Tij as a transformation of this test statistic into a weight.
Similarly, one could use Ti = |ri•|/S =madi/mad to test
H0: ij = 0, j=1,…J vs H1: ij 0 for some j
and map this statistic into a weight.
![Page 37: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/37.jpg)
37
See how it does
• Look initial (individual) weights & fit vs adjusted weights and fit and then to convergence.
• Also look at probe weights in all spike-in probe sets.
• Column weights
![Page 38: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/38.jpg)
38
Starting weights
![Page 39: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/39.jpg)
39
Ending weights
![Page 40: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/40.jpg)
40
Robust fit – composite weights
![Page 41: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/41.jpg)
41
Probe Weights
![Page 42: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/42.jpg)
42
Low-weight Probes - 1
![Page 43: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/43.jpg)
43
Low-weight Probes - 2
![Page 44: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/44.jpg)
44
Low-weight Probes - 3
![Page 45: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/45.jpg)
45
Chip Weights
![Page 46: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/46.jpg)
46
Note on multi chip context
Note that the residual variance in the model without probe effects, the single chip analysis set-up, is ~ 6x the residual variance in the model with probe effects. Ie.
log2(PM*ij) = pi + cj + ij
Vs.
log2(PM*ij) = cj + ij
![Page 47: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/47.jpg)
47
Compare fits on sample probe sets
![Page 48: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/48.jpg)
48
Affy Probe Data Analysis – X hybridizing probe sets
![Page 49: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/49.jpg)
49
X-Hybe probe 3
![Page 50: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/50.jpg)
50
X-Hybe probe 3 – PM vs Phi, Theta
![Page 51: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/51.jpg)
51
X-Hybe probe 4
![Page 52: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/52.jpg)
52
X-Hybe probe 4 - PM vs Phi, Theta
![Page 53: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/53.jpg)
53
X-Hybe probe 5
![Page 54: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/54.jpg)
54
X-Hybe probe 5 – PM vs Phi, Theta
![Page 55: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/55.jpg)
55
Affy Probe Data Analysis – Spike-n probe sets
![Page 56: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/56.jpg)
56
Fit to spike 12
![Page 57: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/57.jpg)
57
Fit to spike 7
![Page 58: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/58.jpg)
58
Fit to spike 1
![Page 59: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/59.jpg)
59
Fit to spike 2
![Page 60: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/60.jpg)
60
Fit to spike 3
![Page 61: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/61.jpg)
61
Fit to spike 4
![Page 62: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/62.jpg)
62
Fit to spike 5
![Page 63: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/63.jpg)
63
What have we gained?
• Look at residuals from fit across large number of probe sets to show benefits of IRLS over median polish.
![Page 64: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/64.jpg)
64
boxplot Residuals from 1000 probe sets
![Page 65: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/65.jpg)
65
boxplot estimated chip effects for 1000 probe sets
![Page 66: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/66.jpg)
66
IQR chip effects for 1000 probe sets
![Page 67: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/67.jpg)
67
What have we gained?
• In order to have a high enough breakdown point to ignore 6 out of 16 probes, and improve on the median polish estimate, we pay a high price in variability.
Q. Can we find a better weighting scheme? Or show MP is optimal?
![Page 68: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/68.jpg)
68
References
1. Irizarry, R. et.al (2003) Summaries of Affymetrix GeneChip probe
level data, Nucleic Acids Research, 2003, Vol. 31, No. 4 e15
2. Irizarry, R. et. al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, in press.
3. C. Li and W.H. Wong, Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, Proceedings of the National Academy of Science U S A, 2001, Vol 98, pp 31-36.
![Page 69: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/69.jpg)
69
References - Robustness
1. P. J. Rousseeuw and A. M.Leroy, Robust Regression and Outlier Detection, John Wiley & Sons, 1987
2. P. J. Huber, Robust Statistics, John Wiley & Sons, 1981.
3. F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, W. A. Stahel, Robust Statistics: The approach based on influence functions, John Wiley & Sons, 1986.
![Page 70: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/70.jpg)
70
References – Robustness in multiway tables
4. J. D. Emerson, D. C. Hoaglin, Analysis of two-way tables by medians, in Understanding robust and exploratory data analysis, publisher=John Wiley \& Sons, Inc., edited by D. C. Hoaglin and F. Mosteller and J. W. Tukey, 1983.
5. N. Cook, Three-way analyses, in Exploring data tables, trends, and shapes, ed. D. C. Hoaglin and F. Mosteller and J. W. Tukey, 1985.
6. J. W. Tukey, One degree of freedom for non-additivity. Biometrics, 1949, 5, 232-242.
![Page 71: Models and methods for summarizing GeneChip probe set data](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814749550346895db48953/html5/thumbnails/71.jpg)
71
References – Robustness in multiway tables
7. C. Daniel, Patterns in residuals in the two-way layout, Technometrics, 1978,20(4), 385-395.
8. J. F. Gentleman and M. B. Wilk, Detecting outliers in a two-way table: I. Statistical behavior of residuals, Tehnometrics, 1975, 17(1), 1-14.
9. J. F. Gentleman and M. B. Wilk, Detecting outliers: II. Supplementing the dierect analysis of residuals, Biometrics, 1975, 31, 387-410.