RNA Spike-in Controls and Analysis Methods for Trustworthy ...

54
RNA spike-in controls & analysis methods for trustworthy genome-scale measurements Sarah A. Munro, Ph.D. Genome-Scale Measurements Group ABRF Meeting March 29, 2015

Transcript of RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Page 1: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

RNA spike-in controls & analysis methods for trustworthy genome-scale

measurements

Sarah A. Munro, Ph.D.Genome-Scale Measurements Group

ABRF MeetingMarch 29, 2015

Page 2: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Overview

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

Page 3: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Overview

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

Page 4: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

How can we have trustworthy gene expression results?

• We’re simultaneously measuring thousands of RNA molecules in gene expression experiments

• But are we getting it right?

Page 5: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

External RNA Controls Consortium (ERCC) initiated by industry, hosted by NIST

• Initiated by Janet Warrington,VP Clinical Genomics at Affymetrix

• Open to all interested parties• Voluntary• More than 90 participants

– Industry, Academia, Government– All major microarray technology

developers– Other gene expression assay

developers

Spike-ins

Page 6: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

ERCC control sequences arein NIST Standard Reference Material 2374

• DNA sequence library• 96 unique control

sequences in DNA plasmids

• Controls intended to mimic mammalian mRNA

• In vitro transcription to make RNA controls NIST SRM 2374 and related data files

are available directly from NIST @http://tinyurl.com/erccsrm

Page 7: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Making ERCC ratio mixtures with true positive and true negative ratios

NIST Plasmid DNA Library

in vitrotranscription

RNA transcripts

Pooling

Mixtures with knownabundance ratios

Page 8: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Treated (n>3)

Using ERCC ratio mixturesControl (n>3)

Page 9: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Control (n>3)Treated (n>3)

Using ERCC ratio mixtures

Page 10: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Control (n>3)Treated (n>3)

Using ERCC ratio mixtures

Page 11: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Control (n>3)Treated (n>3)

Using ERCC ratio mixtures

Measurementprocess

Expression Measures

Statistical Analysis

Multiple stepsMany people & labsTakes days to weeks

Page 12: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Example gene expression data

Treated Control

Page 13: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Are the RNA molecule ratios statistically different across the samples?

Treated Control

Page 14: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Evaluate technical performance with ERCC true positive and true negative ratios

Treated Control

Page 15: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Overview

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

Page 16: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Use erccdashboard to produce standard performance metrics for any experiment

• R package is available from: – Bioconductor– NIST GitHub Site

• Open source and open access for use in– Other analysis tools and

pipelines– Commercial software

Page 17: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Gauge technical performance with 4 erccdashboard figures

• Developed as part of SEQC study, with ABRF partners

• Technology-independent ratio performance measures

• Assessed differences in performance across– Experiments– Laboratories– Measurement processes

Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014).

Page 18: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Ambion ERCC Ratio Mixtures

23 Controls per Subpool Design abundance spans 220

range within each Subpool

Page 19: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Spike-in design for SEQC RNA Sequencing Experiments

Rat ExperimentTreated and Control Rat RNA

Biological Replicates

Interlaboratory ExperimentHuman Reference RNA Samples

Technical Replicates

Samples replicatesfor sequencing

Page 20: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

What is the dynamic range of my experiment?

Rat Experiment Interlaboratory Experiment

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)

Page 21: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

What is the dynamic range of my experiment?

Rat Experiment Interlaboratory Experiment

TypicalSequencing ~40 million sequence reads per replicate

DeepSequencing~260 million sequence reads per replicate

Log2 ERCC Spike Amount (attomol nt µg-1 total RNA) Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)

Log2

Nor

mal

ized

ERCC

Cou

nts

Log2

Nor

mal

ized

ERCC

Cou

nts

Page 22: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

What was the diagnostic power? Rat Experiment Interlaboratory Experiment

True

Pos

itive

Rat

e

True

Pos

itive

Rat

e

False Positive Rate False Positive Rate

Page 23: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

What was the diagnostic power?Rat Experiment Interlaboratory Experiment

True

Pos

itive

Rat

e

True

Pos

itive

Rat

e

False Positive Rate False Positive Rate

Area Under the Curve (AUC)depends on the number of controls detected!

Page 24: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

AUC is a reasonable summary statistic…

But we’d like to evaluate our diagnostic performance as a function of abundance…

Page 25: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Log2

Nor

mal

ized

Ratio

of C

ount

s

Log2 Normalized Average Counts

Rat ExperimentMA Plot

Page 26: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

LODR: Limit of Detection of RatiosRat Experiment Reference RNA

• Model P-values as a function of average signal

• Find P-value threshold based on chosen false discovery rate

• Here FDR = 0.1• Default is FDR = 0.05

• Estimate LODR from intersection of model confidence interval upper bound and P-value threshold

Average Counts

DE Te

st P

-val

ues

Page 27: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

LODR: Limit of Detection of RatiosRat Experiment Reference RNA

LODR provides• Specified confidence in the

differentially expressed transcripts above LODR (90% chance of <10% FDR)

• Guidance for experimental design increase signal for

transcripts above LODR estimateAverage Counts

DE Te

st P

-val

ues

Page 28: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Log2 Normalized Average Counts

4:1 LODRRat ExperimentMA Plot

Page 29: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Log2 Normalized Average Counts

4:1 LODRRat ExperimentMA Plot **

*

Page 30: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Log2 Normalized Average Counts

4:1 LODRLo

g2 R

atio

of N

orm

alize

d Co

unts

Increased sequencing depth shifts endogenous transcript ratio measurements above LODR

Rat ExperimentMA Plot **

*

Page 31: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

What are the LODR estimates for my experiment?

Rat Experiment Interlaboratory Experiment

Average Counts Average Counts

DE Te

st P

-val

ues

DE Te

st P

-val

ues

Page 32: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

How do the endogenous samples relate to LODR?

Rat Experiment Interlaboratory Experiment

4:1 LODR 4:1 LODR

Log2 Normalized Average Counts

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Log2 Normalized Average Counts

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Page 33: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

How much technical variability & bias is there?

Rat Experiment Interlaboratory Experiment

Significant Ratio Bias

Decreased Variability

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Log2

Rat

io o

f Nor

mal

ized

Coun

ts

Page 34: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Total RN

A

Sample 1 Sample 2

mRNA

rRNA

Spike-in

mRNA Fraction Differences Between Samples Contributes to Bias in ERCC Ratios

Sample 1 Sample 2

mRNA

Spike-in

mRNAenrichment

The RNA fractions are exaggerated for illustration purposes

Page 35: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

LODRLimit of Detection of Ratios

• Variability • Bias• LODR &

Sample Transcripts

AUCDiagnostic performance

DynamicRange

Page 36: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

EVALUATE REPRODUCIBILITY ACROSS LABORATORIES

Page 37: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Good Performance

PoorPerformance

Page 38: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Interlaboratory Analysis Using erccdashboard performance metrics

Lab 1-6Illumina + poly-A selection (Illumina kit)

Lab 7-9 Life Tech + poly-A selection (Life Tech kit)

Lab 10-12Illumina + ribosomal RNA depletion

Page 39: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Consistent LODR across 11 of 12 Labs

• Diagnostic performance was consistent within and amongst measurement processes

• Lab 7 was an outlier for diagnostic performance

• LODR agreement with AUC

Laboratory

LODR

(Ave

rage

Cou

nts)

Page 40: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Log(

r m)

Ratio bias is highly variable amongst experiments

• Ratio bias (rm) can be attributed to mRNA fraction difference between samples:

Rs = nominal subpoolratio(E1/E2)s = empirical ratio

• Large standard errors indicate that mRNA fraction isn’t the only factor contributing to ERCC ratio bias

– mRNA enrichment protocol is a factor…

Laboratory

Shippy et al. 2006mRNA fractionDifference

Page 41: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Protocol-dependent bias from poly-A selection affects ERCC controls due to short poly-A tails

Lab 1-6 ILM Poly-A Lab 7-9 LIF Poly-A

Lab 10-12 ILM Ribo

Page 42: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol

Page 43: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

mRNA enrichment protocol biases vary across individual ERCCs but are consistent for a protocol

Page 44: RNA Spike-in Controls and Analysis Methods for Trustworthy ...
Page 45: RNA Spike-in Controls and Analysis Methods for Trustworthy ...
Page 46: RNA Spike-in Controls and Analysis Methods for Trustworthy ...
Page 47: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Results of the erccdashboard Publication

• Ratio performance measures for any technology platform and any experiment– Diagnostic Power– Novel LODR metric– Technical Variability & Bias

• Comparison across experiments

• Quantification of mRNA fraction differences between samples

• Show protocol-dependent bias

Page 48: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Overview

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

Page 49: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

ERCC 2.0: A New Suite of RNA Controls

• Approached by industry and academia to build new RNA controls

• NIST-hosted open, public ERCC 2.0 workshop– Workshop report and

presentations available:

slideshare.net/ERCC-Workshop

• All interested parties are welcome to participate– Sequence contributions– Interlaboratory analysis

• New and Improved mRNA Mimics

• Transcript Isoforms

• miRNA

Page 50: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

New and Improved mRNA Mimics

• Additional controls • Expand distributions

of RNA control properties– Length (> 2kb)– GC content– Poly-A tail length

Page 51: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Transcript Isoform Controls

• Transcript Design– Non-cognate Spike-in

RNA Variants (SIRVs) developed by Lexogen

– Cognate sequence selection in progress

• Schizosaccharomycespombe

• Mixture design– Dynamic Range

• 24

– Design Ratios• < 2:1

Lukas Paul, Lexogen

Page 52: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Small and miRNA Controls

• Needed for validation of clinical applications– Early Detection Research

Network– Tgen

• Other applications relevant to bacterial RNA-Seq

• Non-cognate miRNAcontrols

• Include some pre-miRNA• Direct RNA control synthesis

by Agilent– no need for DNA templates

Karol Thompson, FDA

Page 53: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Recap

• External RNA Controls Consortium (ERCC) RNA spike-in controls

• ‘erccdashboard’ analysis tool• ERCC 2.0: Building an updated suite of RNA

controls

Page 54: RNA Spike-in Controls and Analysis Methods for Trustworthy ...

Acknowledgements

• All External RNA Controls Consortium participants

• NIST– Marc Salit– Steve Lund– P. Scott Pine– Justin Zook– David Duewer– Jerod Parsons– Jennifer McDaniel– Margaret Klein

• Empa– Matthias Roesslein

• SEQC study participants• Co-authors on erccdashboard

manuscript:

S. P. Lund, P. S. Pine, H. Binder,D. Clevert, A. Conesa, J. Dopazo,M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Łabaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit

For more information contact: [email protected]