Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006.
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of Artifacts and Effects in Gene Expression Data Carlo Colantuoni April 12, 2006.
Artifacts and Effects in Gene Expression Data
Carlo Colantuoni
April 12, 2006
Experimental Artifacts
~200 microarrays ~100 samples
Nylon
NIA cDNA microarray Core Facility
P33
9600MGC
elements
Uncorrected Intensities: MDS Colored by Batch
Removing The Batch Effect
We Will Use These Dimensions for Additional Corrective Transformations
Much LikeRed:Green Analysis
Uncorrected Intensities: MDS Colored by Batch
Batch Subtracted Measures: MDS Colored by Batch
MDS of All Array Experiments: Subject Replicates
Hybridization Artifacts
A “Simple” Pilot:
2 subjects in rep. = 4 arrays
Differing amounts of dye2-color (reference)
~48,000 probes
4 arrays: Raw Log Intensities
4 arrays: Raw Linear Intensities
1 array: Ratio v. Intensity
1 array: Ratio v. Intensity
Biological Effects
… or are they?
Big Effects:
Tissue Types and Growth Factor
Treatments
Illumina 24K
Smaller Effects:
Correlation of Gene Expression with
Biological Indices
pH
PMI
age
NylonP33
10K
Illuminacustom
700
More Subtle Effects:
Differential Gene Expression by Genotype
COMT Val158Met SNP Affects Cognition and Risk for Schizophrenia
COMT enzyme activity
GeneticsCognition & Disease
Risk for Schizophrenia
Working Memory Performance
Patterns of Cortical Activation
Amphetamine & Tolcapone Response
VVVMMM
p<0.00002
Over-Expression of HSP70 in VV Homozygotes
VV-VM
Effect of COMT V158M on Gene Expression
NylonP33
10K
MM-VM
Effect of COMT V158M on Gene Expression
NylonP33
10K
VV-MM
Effect of COMT V158M on Gene Expression
NylonP33
10K
VV-VM T-stat
MM
-VM
T-s
tat
Looking Across Multiple Effects: Age and
Genotype
N=15 genes across 80 subjects
p<7.34e-13
Alternative Approaches
COMT Activity as a Function of COMT Genotype
-0.4 -0.2 0.0 0.2 0.4
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Distribution of Observed (black) and Permuted (blue) Correlations (r)
Correlation (r)
Den
sity
Correlation of COMT Activity with Expression
Permuted
Observed
Correlation (r)
N=64
p<0.000089
r=0.45
AcknowledgementsClinical Brain Disorders Branch, NIMH, NIH
Daniel Weinberger
Section on NeuropathologyJoel KleinmanThomas Hyde
Tissue ResourcesMary Herman
Amy Deep-SoboslayColleen Lynch
GenotypingRichard Straub
Bhaskar Kolachana
COMT ActivityJingshan ChenSamer Helem
RNA ResourcesJohanna CreswellClaudia AguirreRobert Fatula
Jeet BahraIsha Khan
Debora RothmondBarbara Lipska
Nick BeMariam Khan
National Institute on Drug Abuse, NIH, DHHSWilliam FreedElin Lehrmann
National Institute on Aging , NIH, DHHSKevin BeckerWilliam Wood
Diane Teichberg
Johns Hopkins School of Public HealthDepartment of Biostatistics
Scott ZegerZhianqan TanRafael Irizarry
Giovanni ParmigianiElizabeth Johnson
NHGRI Microarray FacilityAbdel Elkahloun
Iddil Berkov CBDB
Beyond Individual Genes:Functional Gene Groups
• Borrow statistical power across entire
dataset
• Beyond threshold enrichment
• Systematic patterns throughout the dataset
-0.4 -0.2 0.0 0.2 0.4
01
23
Distribution of Observed (black) and Permuted (red+blue) Correlations (r)
Correlation (r)
Den
sity
Correlation of Age with Gene Expression
Over-Expression of HSP70’s in VV Homozygotes
p<7.42e-08
T statistic
3 Statistical Tests:
2
Kolmogorov-Smirnov
“Information”
Is THIS …
… Different from THIS?
histogrambins
E
O
2
ED =
(O-E)2______
2 is the sum of D values where:
All Genes
Subset of Interest
All Genes
Subset of Interest
Kolmogorov-Smirnov
All Genes
Subset of Interest
Product of Individual Probabilities
histogrambins
E
O
2
ED =
(O-E)2______
2 is the sum of D values where: E^0.5DPCA =
O-E______
Dimension #1
Dim
ensi
on #
2
p value
0.0
>0.130
600
540
p<0.001
N = 20Pent.Phos.#30
p<0.032
N = 25Fruc.Mann.#51
p<0.097
N = 94Sphingo-Glycolip.#600
51
p<0.110
N = 96IP3#562
p<0.996
N = 44Pyrimid.Metabo.#240
p<0.999
N = 17Ribo-flavin#740
562
240
p<0.079
N = 3Lipo-Polysacch.#540
p<0.107
N = 4Lys.Biosyn.#300
740
300
Log10 Ratio Z-Score
Pro
port
ion
of G
enes
p<0.079
N = 24Aln.Asp.#252
p<0.133
N = 7C byfolate#670
252
670
N = 89 Gene Subsets
All Genes
The distribution of gene expression values for each gene group is passed to PCA as D^0.5 values and then plotted as a single point in low dimensional space.
Distance from center indicates deviation from distribution of all gene expression values in the microarray experiment
Proximity indicates similarity in the shape of distributions.
ED =
(O-E)2______
E^0.5DPCA =
O-E______
Analysis of Gene Networks
No Effect of Other COMT SNPs: P3224
Permuted
Observed
T statistic
1/1-1/2N=21 N=30
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Distribution of p-values from Observed (Black) and Permuted Data
p-value
Den
sity
Distribution of p-values
Permuted
Observed
p-value
N=90
-0.4 -0.2 0.0 0.2 0.4
01
23
Distribution of Observed (black) and Permuted (red+blue) Correlations (r)
Correlation (r)
Den
sity
Permuted
Correlation of Age with Gene Expression
Observed
Correlation (r)
N=90
-0.45 -0.40 -0.35 -0.30
0.00
0.05
0.10
0.15
0.20
Distribution of Observed (black) and Permuted (red+blue) Correlations (r)
Correlation (r)
Den
sity
Permuted
Observed=
Correlation of Age with Gene Expression
FDR =False Pos.
Total Pos.
Permuted
Observed
Correlation (r)
Correlation of GFAP Expression with Age
r=0.47
p<0.000002
Age (yr)
Ex
pre
ss
ion
: L
og
(Rat
io)
SD
Un
its
fro
m M
ea
n
(p<0.02)
2 arrays(4 channels)