KEGG: new perspectives on genomes, pathways, diseases and drugs
Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier...
Transcript of Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier...
![Page 1: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/1.jpg)
Generating quality metrics reports for
microarray data sets
Audrey Kauffmann
![Page 2: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/2.jpg)
2
Introduction
• Microarrays are widely/routinely used
• Technology and protocol improvements → trustworthy
• Variance and noise– Technical causes:
• Platform
• Lab, experimentalist
• RNA extraction
• Amplification, labeling, hybridization, scanning...
– Biological causes:
• Tissue itself (cell lines, biopsies, blood...)
• Tissue contamination
• Clinical covariates (age, sex, race...)
• Cell cycle...
![Page 3: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/3.jpg)
3
Who is concerned?• Experimentalist investigating a set of samples
– Choose between different technology platforms, expt. Protocols– Decide when to repeat (certain parts of) the experiment
• Statistical collaborator analysing the experiment– Decide whether to proceed or to ask the experimentalist to go back to the lab
• Microarray core facility– Decide whether to consider their product fit for delivery to customer– Customer decides whether to be content (pay the bill)
• Integrative biologist analysing data in a public database– Has to choose which experiments– Which arrays within an experiment to consider
• Public data(base) provider– Put a quality score on each of her offerings
![Page 4: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/4.jpg)
4
At which step of the analysis?
• Importing the data
• Preprocessing: background correction,
normalisation, summarization of probesets
• Differential Expression
• Gene set enrichment analysis
![Page 5: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/5.jpg)
5
At which step of the analysis?
• Importing the data
• Quality Assessment
• Preprocessing: background correction,
normalisation, summarization of probesets
• Quality Assessment
• Differential Expression
• Gene set enrichment analysis
![Page 6: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/6.jpg)
6
At which step of the analysis?
• Importing the data
• Quality Assessment
• Preprocessing: background correction,
normalisation, summarization of probesets
• Quality Assessment
• Differential Expression
• Gene set enrichment analysis
Remove outlier(s)
![Page 7: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/7.jpg)
7
What aspects to be evaluated?Which quality metrics?
Per Slide
• What are we looking at?– Intensitydependent ratio– Detection of spatial effects
• How?– MAplots– Representation of the chip
Between Slides
• What are we looking at?– Homogeneity– Outlier samples– Biological meaning
• How?– Boxplots, density plots– Heatmap, PCA
![Page 8: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/8.jpg)
8
How to easily perform quality assessment?
• arrayQualityMetrics Bioconductor package for Affymetrix,
Agilent, Illumina, homemade arrays …
• From an R object ⇒ HTML report
• Plots:– MA plot and spatial representations
– Boxplots and density
– Heatmap and PCA
– Variancemean dependency
– GC content and probe mapping studies
– Affymetrix only: NUSE, RLE, RNA degradation, QCstats, PM/MM
• Outlier identification
![Page 9: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/9.jpg)
9
Handson
• Run reports
![Page 10: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/10.jpg)
10
arrayQualityMetrics report – outlier detection
1.5 * IQR
outliers
Boxplot of the scores S
![Page 11: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/11.jpg)
11
Fourier Transform
S i=∣ik
arrayQualityMetrics report – per array
S i=∑ lowFreqik
∑ highFreq ik
![Page 12: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/12.jpg)
12
S i =median I ik S i =IQR I ik
arrayQualityMetrics report – intensity distributions
![Page 13: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/13.jpg)
13
d ij =mean k ∣I ik−I jk∣
For each couple of arrays i and j, k is a probe and the distance between the arrays is:
S i=∑ d ij
arrayQualityMetrics report – Between arrays
![Page 14: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/14.jpg)
14
arrayQualityMetrics report – Affymetrix plots
![Page 15: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/15.jpg)
15
S i =median I ik S i =IQR I ik
arrayQualityMetrics report – Affymetrix NUSE: Normalised Unscaled Standard Error
• Fitting a probe level model (gene k, array j)
• Differences in variability between genes. An array with elevated SE
(standard error) relative to the other arrays is of lower quality
![Page 16: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/16.jpg)
16
Why is outlier detection important? Example
• ArrayExpress experiment EMEXP886, cerebellar gene
expression:
– 5 WT mice (15 weeks of age)
– 5 Atnx1 KO mice (15 weeks of age)
• Affymetrix MOE430A (mouse) Genechip
• Ataxin 1 (Atxn1): protein of unknown function associated
with cerebellar neurodegeneration in SpinoCerebellar
Ataxia type 1 (SCA1), which impairs the of eye movement
![Page 17: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/17.jpg)
17
Results from EMEXP886 analysis
• Moderated ttest
• Most enriched KEGG Pathway
5.65410 samples
Corrected tvalue# Significant genes
Neuroactive ligandreceptor interaction
43410 samples
P < 0.001P < 0.01
# Genes
![Page 18: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/18.jpg)
18
Outlier array’s impact on results• Moderated ttest (limma)
• Most enriched KEGG Pathway
43410 samples
14190Without array 1
P < 0.001P < 0.01
# Genes
11.5323Without array 1
5.65410 samples
Corrected tvalue# Significant genes
Neuroactive ligandreceptor interaction
![Page 19: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/19.jpg)
19
Validation of outlier’s array specificity
• Array #1 specific effect?
⇒Remove one by one each array and perform the moderated ttest
# Genes
P < 0.01 P < 0.001
10 samples 34 4
Without sample 1 190 14
Without sample 2 39 3
Without sample 3 29 2
Without sample 4 21 1
Without sample 5 12 1
Without sample 6 87 5
Without sample 7 23 4
Without sample 8 34 4
Without sample 9 17 2
Without sample 10 23 2
![Page 20: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/20.jpg)
20
Horror Picture Show
![Page 21: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/21.jpg)
21
![Page 22: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/22.jpg)
22
![Page 23: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/23.jpg)
23
![Page 24: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/24.jpg)
24
![Page 25: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/25.jpg)
25
![Page 26: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/26.jpg)
26
![Page 27: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/27.jpg)
27
![Page 28: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/28.jpg)
28
![Page 29: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway](https://reader036.fdocuments.in/reader036/viewer/2022071104/5fde6028906d4a7bbe2661e0/html5/thumbnails/29.jpg)
29
Conclusions
• Quality assessment is important– Still needed– Give a first “taste” of the data– By removing outlier, the statistical power is increased
• arrayQualityMetrics package– One command line– Before preprocessing: to decide which normalisation– After normalisation: to check normalisation efficiency– Comprehensive report– Outlier detection