Removing Unwanted Variation from methylation array data… All you need is RUV! - Jovana Maksimovic
-
Upload
australian-bioinformatics-network -
Category
Documents
-
view
479 -
download
0
Transcript of Removing Unwanted Variation from methylation array data… All you need is RUV! - Jovana Maksimovic
The power of RUV To assess the performance of RUV, we need to know
some “truth”
• To define “truth”, we performed differential methylation analysis on 450k
ageing data (Heyn et al. PNAS, 2012. Jun 26;109(26)10522-7) (20 newborns
vs. 20 centenarians) using limma. This is the truth set (Fig. 5a)
• We then simulated a batch effect in the same data by modifying a
random subset of 18 newborn and 4 centenarian samples
• The batch effect was introduced by scaling the green channel
intensities of the selected samples
• The scaling was performed for every green channel intensity within a
sample using values randomly sampled from a normal distribution (mean=10,
SD=100). This was repeated for each of the selected samples, creating
additional variation within the batch (e.g. sample quality differences),
which cannot be modelled. This is the test set (Fig. 5c)
• We compared the performance of 2 RUV flavours (RUV-inverse, RUV-
ridge inverse) to other adjustment methods in a differential methylation
analysis of the test set
• The competing methods were selected as they are routinely used to
adjust for batch effects and other unwanted variation in differential analyses
• Using a receiver operating characteristic (ROC) analysis (Fig. 6), both
RUV flavours significantly outperform other methods
• RUV-inverse with empirical controls performed the best (Fig. 6)
Tru
e p
osit
ive r
ate
batch 1
batch 2
Adapted from Lazar C et al.
Brief Bioinform 2012
(2) MDS Plot: Batch effects in microarray data
Y = M-values = log2(Meth. intensity/Unmeth.
intensity)
X = factors of interest eg. disease state,
treatment, etc
Z = observed covariates eg. sex, ethnicity
W = unobserved unwanted factors (estimated
using negative controls)
Ymxn=Xmxpβmxn+Zmxqγqxn+Wmxkαkxn+εmxn
(3) RUV Linear Model
Removing Unwanted Variation from methylation
array data… All you need is RUV!
Conclusion 1. Simply the best! RUV performs
better in a differential methylation
analysis of 450k array data than all
other methods tested
2. RUV-inverse performs the best using
an appropriately defined set of empirical
controls
3. The RUV analysis pipeline is easily
implemented in R for 450k data
Aim 1. Compare the new crazy little
thing called RUV to existing
methods for removing unwanted
variation from 450k array data
2. Devise a “best practice” pipeline
for removing unwanted variation
from 450k data
RUV = Remove Unwanted Variation
• A family of methods (Gagnon-Bartsch et al., manuscript in
preparation) that extend the framework developed for RUV2
(Gagnon-Bartsch & Speed, Biostatistics, 2012.
Jul;13(3):539-52) for correcting for unwanted variation
• The RUV methods use negative control genes/probes
to infer unwanted factors from the data
• Negative controls should not be associated with the
factor of interest, but should capture unwanted variation
• Adjustment is applied at the stage of the differential
comparison via the linear modelling framework (Fig. 3)
What is RUV? Let’s get it started… Illumina Infinium HumanMethylation450 BeadChip
• The 50bp Infinium methylation probes query a [C/T] polymorphism
created by bisulfite conversion of unmethylated cytosines in the genome.
• 450k array covers 485,577 CpG sites
• Infinium I probe type (~25%) (Fig. 1a)
• Infinium II probe type (~75%) (Fig. 1b)
5’
GC
GT GC
M U
bisulfite conversion
methylated unmethylated
GC GT CG CA
G
T
C
A
G
T
C
A
5’
5’ 5’
5’ 5’
DNA sample
(1a) Infinium I
(1b) Infinium II
GC
GT GC
bisulfite conversion
methylated unmethylated
G[C/T] C
5’
5’ 5’
DNA sample
G
A
450k arrays are popular for large differential
methylation studies
• Genome-wide
• High-throughput
• Single-nucleotide resolution
• Relatively inexpensive compared to other methods
e.g. sequencing
• Great for large studies with many samples
The good…
Large studies are particularly
susceptible to unwanted variation
• Good experimental design can
mitigate effects of unwanted variation
• But, factors causing unwanted
variation can be unknown
• And, unwanted factors can
sometimes be the largest source of
variation (Fig. 2)
• In large studies, batch effects are
often unavoidable due to limitations on
how many samples can be processed at
any one time
…the bad & the ugly
Jovana Maksimovic1, Terry Speed2, Alicia Oshlack1 1 Murdoch Childrens Research Institute, RCH, Melbourne, Australia; 2 Walter and Eliza Hall Institute, Melbourne, Australia
RUV
M-values
Illumina
negative
control probes
Ranked list
of DM CpGs RUV
Empirical
controls
Ranked list
of DM CpGs
Step 1 Step 2*
(4) RUV differential methylation analysis pipeline
Using RUV in 450k differential methylation (DM) analysis
• RUV relies on negative controls to adjust for unwanted variation
• Illumina includes 614 negative control probes (neg) on the 450k
array for background correction - we can use these as negative
controls in RUV DM analysis (Fig. 4, Step 1) as they capture some
technical variation between arrays (Fig 5b&d)
• However, the “negs” only produce a background-level signal and
cannot capture unwanted biological variation
• But, we can use the ranked list of CpGs generated by RUV DM
analysis with “negs” to empirically identify CpG probes not
associated with the factor of interest (e.g. bottom 50% of list) that are
a more representative set of negative controls (Fig. 4, Step 2).
These are called empirical controls (emp)
• The results from RUV DM analysis with “emp” can then be used to
further refine the set of empirical controls, if necessary (Fig.4, Step 2)
That’s the way RUV goes
(5a) MDS Plot: Truth set (5b) Truth set: neg. control probes
(5c) MDS Plot: Test set (5d) Test set: neg. control probes
False positive rate
(6) ROC Curve: Performance of RUV vs. other methods
False positive rate
* This step can be performed zero or more times