RNA Spike-in Controls and Analysis Methods for Trustworthy ...
-
Upload
truonghanh -
Category
Documents
-
view
214 -
download
1
Transcript of RNA Spike-in Controls and Analysis Methods for Trustworthy ...
RNA spike-in controls & analysis methods for trustworthy genome-scale
measurements
Sarah A. Munro, Ph.D.Genome-Scale Measurements Group
ABRF MeetingMarch 29, 2015
Overview
• External RNA Controls Consortium (ERCC) RNA spike-in controls
• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA
controls
Overview
• External RNA Controls Consortium (ERCC) RNA spike-in controls
• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA
controls
How can we have trustworthy gene expression results?
• We’re simultaneously measuring thousands of RNA molecules in gene expression experiments
• But are we getting it right?
External RNA Controls Consortium (ERCC) initiated by industry, hosted by NIST
• Initiated by Janet Warrington,VP Clinical Genomics at Affymetrix
• Open to all interested parties• Voluntary• More than 90 participants
– Industry, Academia, Government– All major microarray technology
developers– Other gene expression assay
developers
Spike-ins
ERCC control sequences arein NIST Standard Reference Material 2374
• DNA sequence library• 96 unique control
sequences in DNA plasmids
• Controls intended to mimic mammalian mRNA
• In vitro transcription to make RNA controls NIST SRM 2374 and related data files
are available directly from NIST @http://tinyurl.com/erccsrm
Making ERCC ratio mixtures with true positive and true negative ratios
NIST Plasmid DNA Library
in vitrotranscription
RNA transcripts
Pooling
Mixtures with knownabundance ratios
…
Treated (n>3)
Using ERCC ratio mixturesControl (n>3)
Control (n>3)Treated (n>3)
Using ERCC ratio mixtures
Control (n>3)Treated (n>3)
Using ERCC ratio mixtures
Control (n>3)Treated (n>3)
Using ERCC ratio mixtures
Measurementprocess
Expression Measures
Statistical Analysis
Multiple stepsMany people & labsTakes days to weeks
Example gene expression data
Treated Control
Are the RNA molecule ratios statistically different across the samples?
Treated Control
Evaluate technical performance with ERCC true positive and true negative ratios
Treated Control
Overview
• External RNA Controls Consortium (ERCC) RNA spike-in controls
• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA
controls
Use erccdashboard to produce standard performance metrics for any experiment
• R package is available from: – Bioconductor– NIST GitHub Site
• Open source and open access for use in– Other analysis tools and
pipelines– Commercial software
Gauge technical performance with 4 erccdashboard figures
• Developed as part of SEQC study, with ABRF partners
• Technology-independent ratio performance measures
• Assessed differences in performance across– Experiments– Laboratories– Measurement processes
Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014).
Ambion ERCC Ratio Mixtures
23 Controls per Subpool Design abundance spans 220
range within each Subpool
Spike-in design for SEQC RNA Sequencing Experiments
Rat ExperimentTreated and Control Rat RNA
Biological Replicates
Interlaboratory ExperimentHuman Reference RNA Samples
Technical Replicates
Samples replicatesfor sequencing
What is the dynamic range of my experiment?
Rat Experiment Interlaboratory Experiment
Log2
Nor
mal
ized
ERCC
Cou
nts
Log2
Nor
mal
ized
ERCC
Cou
nts
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
What is the dynamic range of my experiment?
Rat Experiment Interlaboratory Experiment
TypicalSequencing ~40 million sequence reads per replicate
DeepSequencing~260 million sequence reads per replicate
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
Log2
Nor
mal
ized
ERCC
Cou
nts
Log2
Nor
mal
ized
ERCC
Cou
nts
What was the diagnostic power? Rat Experiment Interlaboratory Experiment
True
Pos
itive
Rat
e
True
Pos
itive
Rat
e
False Positive Rate False Positive Rate
What was the diagnostic power?Rat Experiment Interlaboratory Experiment
True
Pos
itive
Rat
e
True
Pos
itive
Rat
e
False Positive Rate False Positive Rate
Area Under the Curve (AUC)depends on the number of controls detected!
AUC is a reasonable summary statistic…
But we’d like to evaluate our diagnostic performance as a function of abundance…
Log2
Nor
mal
ized
Ratio
of C
ount
s
Log2 Normalized Average Counts
Rat ExperimentMA Plot
LODR: Limit of Detection of RatiosRat Experiment Reference RNA
• Model P-values as a function of average signal
• Find P-value threshold based on chosen false discovery rate
• Here FDR = 0.1• Default is FDR = 0.05
• Estimate LODR from intersection of model confidence interval upper bound and P-value threshold
Average Counts
DE Te
st P
-val
ues
LODR: Limit of Detection of RatiosRat Experiment Reference RNA
LODR provides• Specified confidence in the
differentially expressed transcripts above LODR (90% chance of <10% FDR)
• Guidance for experimental design increase signal for
transcripts above LODR estimateAverage Counts
DE Te
st P
-val
ues
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
Log2 Normalized Average Counts
4:1 LODRRat ExperimentMA Plot
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
Log2 Normalized Average Counts
4:1 LODRRat ExperimentMA Plot **
*
Log2 Normalized Average Counts
4:1 LODRLo
g2 R
atio
of N
orm
alize
d Co
unts
Increased sequencing depth shifts endogenous transcript ratio measurements above LODR
Rat ExperimentMA Plot **
*
What are the LODR estimates for my experiment?
Rat Experiment Interlaboratory Experiment
Average Counts Average Counts
DE Te
st P
-val
ues
DE Te
st P
-val
ues
How do the endogenous samples relate to LODR?
Rat Experiment Interlaboratory Experiment
4:1 LODR 4:1 LODR
Log2 Normalized Average Counts
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
Log2 Normalized Average Counts
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
How much technical variability & bias is there?
Rat Experiment Interlaboratory Experiment
Significant Ratio Bias
Decreased Variability
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
Log2
Rat
io o
f Nor
mal
ized
Coun
ts
Total RN
A
Sample 1 Sample 2
mRNA
rRNA
Spike-in
mRNA Fraction Differences Between Samples Contributes to Bias in ERCC Ratios
Sample 1 Sample 2
mRNA
Spike-in
mRNAenrichment
The RNA fractions are exaggerated for illustration purposes
LODRLimit of Detection of Ratios
• Variability • Bias• LODR &
Sample Transcripts
AUCDiagnostic performance
DynamicRange
EVALUATE REPRODUCIBILITY ACROSS LABORATORIES
Good Performance
PoorPerformance
Interlaboratory Analysis Using erccdashboard performance metrics
Lab 1-6Illumina + poly-A selection (Illumina kit)
Lab 7-9 Life Tech + poly-A selection (Life Tech kit)
Lab 10-12Illumina + ribosomal RNA depletion
Consistent LODR across 11 of 12 Labs
• Diagnostic performance was consistent within and amongst measurement processes
• Lab 7 was an outlier for diagnostic performance
• LODR agreement with AUC
Laboratory
LODR
(Ave
rage
Cou
nts)
Log(
r m)
Ratio bias is highly variable amongst experiments
• Ratio bias (rm) can be attributed to mRNA fraction difference between samples:
Rs = nominal subpoolratio(E1/E2)s = empirical ratio
• Large standard errors indicate that mRNA fraction isn’t the only factor contributing to ERCC ratio bias
– mRNA enrichment protocol is a factor…
Laboratory
Shippy et al. 2006mRNA fractionDifference
Protocol-dependent bias from poly-A selection affects ERCC controls due to short poly-A tails
Lab 1-6 ILM Poly-A Lab 7-9 LIF Poly-A
Lab 10-12 ILM Ribo
mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol
mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol
Results of the erccdashboard Publication
• Ratio performance measures for any technology platform and any experiment– Diagnostic Power– Novel LODR metric– Technical Variability & Bias
• Comparison across experiments
• Quantification of mRNA fraction differences between samples
• Show protocol-dependent bias
Overview
• External RNA Controls Consortium (ERCC) RNA spike-in controls
• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA
controls
ERCC 2.0: A New Suite of RNA Controls
• Approached by industry and academia to build new RNA controls
• NIST-hosted open, public ERCC 2.0 workshop– Workshop report and
presentations available:
slideshare.net/ERCC-Workshop
• All interested parties are welcome to participate– Sequence contributions– Interlaboratory analysis
• New and Improved mRNA Mimics
• Transcript Isoforms
• miRNA
New and Improved mRNA Mimics
• Additional controls • Expand distributions
of RNA control properties– Length (> 2kb)– GC content– Poly-A tail length
Transcript Isoform Controls
• Transcript Design– Non-cognate Spike-in
RNA Variants (SIRVs) developed by Lexogen
– Cognate sequence selection in progress
• Schizosaccharomycespombe
• Mixture design– Dynamic Range
• 24
– Design Ratios• < 2:1
Lukas Paul, Lexogen
Small and miRNA Controls
• Needed for validation of clinical applications– Early Detection Research
Network– Tgen
• Other applications relevant to bacterial RNA-Seq
• Non-cognate miRNAcontrols
• Include some pre-miRNA• Direct RNA control synthesis
by Agilent– no need for DNA templates
Karol Thompson, FDA
Recap
• External RNA Controls Consortium (ERCC) RNA spike-in controls
• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA
controls
Acknowledgements
• All External RNA Controls Consortium participants
• NIST– Marc Salit– Steve Lund– P. Scott Pine– Justin Zook– David Duewer– Jerod Parsons– Jennifer McDaniel– Margaret Klein
• Empa– Matthias Roesslein
• SEQC study participants• Co-authors on erccdashboard
manuscript:
S. P. Lund, P. S. Pine, H. Binder,D. Clevert, A. Conesa, J. Dopazo,M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Łabaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit
For more information contact: [email protected]