RNA Spike-in Controls and Analysis Methods for Trustworthy ...

RNA spike-in controls & analysis methods for trustworthy genome-scale

measurements

Sarah A. Munro, Ph.D.Genome-Scale Measurements Group

ABRF MeetingMarch 29, 2015

Overview

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

How can we have trustworthy gene expression results?

• We’re simultaneously measuring thousands of RNA molecules in gene expression experiments

• But are we getting it right?

External RNA Controls Consortium (ERCC) initiated by industry, hosted by NIST

• Initiated by Janet Warrington,VP Clinical Genomics at Affymetrix

• Open to all interested parties• Voluntary• More than 90 participants

– Industry, Academia, Government– All major microarray technology

developers– Other gene expression assay

developers

Spike-ins

ERCC control sequences arein NIST Standard Reference Material 2374

• DNA sequence library• 96 unique control

sequences in DNA plasmids

• Controls intended to mimic mammalian mRNA

• In vitro transcription to make RNA controls NIST SRM 2374 and related data files

are available directly from NIST @http://tinyurl.com/erccsrm

http://tinyurl.com/erccsrm

Making ERCC ratio mixtures with true positive and true negative ratios

NIST Plasmid DNA Library

in vitrotranscription

RNA transcripts

Pooling

Mixtures with knownabundance ratios

…

Treated (n>3)

Using ERCC ratio mixturesControl (n>3)

Control (n>3)Treated (n>3)

Using ERCC ratio mixtures

Control (n>3)Treated (n>3)

Using ERCC ratio mixtures

Measurementprocess

Expression Measures

Statistical Analysis

Multiple stepsMany people & labsTakes days to weeks

Example gene expression data

Treated Control

Are the RNA molecule ratios statistically different across the samples?

Treated Control

Evaluate technical performance with ERCC true positive and true negative ratios

Treated Control

Overview



controls

Use erccdashboard to produce standard performance metrics for any experiment

• R package is available from: – Bioconductor– NIST GitHub Site

• Open source and open access for use in– Other analysis tools and

pipelines– Commercial software

Gauge technical performance with 4 erccdashboard figures

• Developed as part of SEQC study, with ABRF partners

• Technology-independent ratio performance measures

• Assessed differences in performance across– Experiments– Laboratories– Measurement processes

Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014).

Ambion ERCC Ratio Mixtures

23 Controls per Subpool Design abundance spans 220

range within each Subpool

Spike-in design for SEQC RNA Sequencing Experiments

Rat ExperimentTreated and Control Rat RNA

Biological Replicates

Interlaboratory ExperimentHuman Reference RNA Samples

Technical Replicates

Samples replicatesfor sequencing

What is the dynamic range of my experiment?

Rat Experiment Interlaboratory Experiment

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)

What is the dynamic range of my experiment?


TypicalSequencing ~40 million sequence reads per replicate

DeepSequencing~260 million sequence reads per replicate

Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2

Nor

mal

ized

ERCC

Cou

nts

What was the diagnostic power? Rat Experiment Interlaboratory Experiment

True

Pos

itive

Rat

e

True

Pos

itive

Rat

e

False Positive Rate False Positive Rate

What was the diagnostic power?Rat Experiment Interlaboratory Experiment

True

Pos

itive

Rat

e

True

Pos

itive

Rat

e

False Positive Rate False Positive Rate

Area Under the Curve (AUC)depends on the number of controls detected!

AUC is a reasonable summary statistic…

But we’d like to evaluate our diagnostic performance as a function of abundance…

Log2

Nor

mal

ized

Ratio

of C

ount

s

Log2 Normalized Average Counts

Rat ExperimentMA Plot

LODR: Limit of Detection of RatiosRat Experiment Reference RNA

• Model P-values as a function of average signal

• Find P-value threshold based on chosen false discovery rate

• Here FDR = 0.1• Default is FDR = 0.05

• Estimate LODR from intersection of model confidence interval upper bound and P-value threshold

Average Counts

DE Te

st P

-val

ues

LODR: Limit of Detection of RatiosRat Experiment Reference RNA

LODR provides• Specified confidence in the

differentially expressed transcripts above LODR (90% chance of <10% FDR)

• Guidance for experimental design increase signal for

transcripts above LODR estimateAverage Counts

DE Te

st P

-val

ues

Log2

Rat

io o

f Nor

mal

ized

Coun

ts


4:1 LODRRat ExperimentMA Plot

Log2

Rat

io o

f Nor

mal

ized

Coun

ts


4:1 LODRRat ExperimentMA Plot **

*


4:1 LODRLo

g2 R

atio

of N

orm

alize

d Co

unts

Increased sequencing depth shifts endogenous transcript ratio measurements above LODR

Rat ExperimentMA Plot **

*

What are the LODR estimates for my experiment?


Average Counts Average Counts

DE Te

st P

-val

ues

DE Te

st P

-val

ues

How do the endogenous samples relate to LODR?


4:1 LODR 4:1 LODR


Log2

Rat

io o

f Nor

mal

ized

Coun

ts


Log2

Rat

io o

f Nor

mal

ized

Coun

ts

How much technical variability & bias is there?


Significant Ratio Bias

Decreased Variability

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Total RN

A

Sample 1 Sample 2

mRNA

rRNA

Spike-in

mRNA Fraction Differences Between Samples Contributes to Bias in ERCC Ratios

Sample 1 Sample 2

mRNA

Spike-in

mRNAenrichment

The RNA fractions are exaggerated for illustration purposes

LODRLimit of Detection of Ratios

• Variability • Bias• LODR &

Sample Transcripts

AUCDiagnostic performance

DynamicRange

EVALUATE REPRODUCIBILITY ACROSS LABORATORIES

Good Performance

PoorPerformance

Interlaboratory Analysis Using erccdashboard performance metrics

Lab 1-6Illumina + poly-A selection (Illumina kit)

Lab 7-9 Life Tech + poly-A selection (Life Tech kit)

Lab 10-12Illumina + ribosomal RNA depletion

Consistent LODR across 11 of 12 Labs

• Diagnostic performance was consistent within and amongst measurement processes

• Lab 7 was an outlier for diagnostic performance

• LODR agreement with AUC

Laboratory

LODR

(Ave

rage

Cou

nts)

Log(

r m)

Ratio bias is highly variable amongst experiments

• Ratio bias (rm) can be attributed to mRNA fraction difference between samples:

Rs = nominal subpoolratio(E1/E2)s = empirical ratio

• Large standard errors indicate that mRNA fraction isn’t the only factor contributing to ERCC ratio bias

– mRNA enrichment protocol is a factor…

Laboratory

Shippy et al. 2006mRNA fractionDifference

Protocol-dependent bias from poly-A selection affects ERCC controls due to short poly-A tails

Lab 1-6 ILM Poly-A Lab 7-9 LIF Poly-A

Lab 10-12 ILM Ribo

mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol

Results of the erccdashboard Publication

• Ratio performance measures for any technology platform and any experiment– Diagnostic Power– Novel LODR metric– Technical Variability & Bias

• Comparison across experiments

• Quantification of mRNA fraction differences between samples

• Show protocol-dependent bias

Overview



controls

ERCC 2.0: A New Suite of RNA Controls

• Approached by industry and academia to build new RNA controls

• NIST-hosted open, public ERCC 2.0 workshop– Workshop report and

presentations available:

slideshare.net/ERCC-Workshop

• All interested parties are welcome to participate– Sequence contributions– Interlaboratory analysis

• New and Improved mRNA Mimics

• Transcript Isoforms

• miRNA

New and Improved mRNA Mimics

• Additional controls • Expand distributions

of RNA control properties– Length (> 2kb)– GC content– Poly-A tail length

Transcript Isoform Controls

• Transcript Design– Non-cognate Spike-in

RNA Variants (SIRVs) developed by Lexogen

– Cognate sequence selection in progress

• Schizosaccharomycespombe

• Mixture design– Dynamic Range

• 24

– Design Ratios• < 2:1

Lukas Paul, Lexogen

Small and miRNA Controls

• Needed for validation of clinical applications– Early Detection Research

Network– Tgen

• Other applications relevant to bacterial RNA-Seq

• Non-cognate miRNAcontrols

• Include some pre-miRNA• Direct RNA control synthesis

by Agilent– no need for DNA templates

Karol Thompson, FDA

Recap



controls

Acknowledgements

• All External RNA Controls Consortium participants

• NIST– Marc Salit– Steve Lund– P. Scott Pine– Justin Zook– David Duewer– Jerod Parsons– Jennifer McDaniel– Margaret Klein

• Empa– Matthias Roesslein

• SEQC study participants• Co-authors on erccdashboard

manuscript:

S. P. Lund, P. S. Pine, H. Binder,D. Clevert, A. Conesa, J. Dopazo,M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Łabaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit

For more information contact: [email protected]

RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Documents

Transcript of RNA Spike-in Controls and Analysis Methods for Trustworthy ...